---
title: |
  |
  | 
  | \vspace{1cm}\pagenumbering{gobble} Believing and Sharing Information by Fake Sources: An Experiment\vspace{0.5cm}
  |
author: |
  | Paul C. Bauer (MZES) & Bernhard Clemm von Hohenberg (EUI)^[CONTACT: Dr. Paul C. Bauer, mail@paulcbauer.eu, University Mannheim, Mannheim Centre for European Social Research (MZES), 68131  Mannheim, Germany] ^[AFFILIATIONS: Mannheim Centre for European Social Research (MZES), University Mannheim, Mannheim, Germany, European University Institute, Florence, Italy] ^[Both authors contributed equally to this work. Code and data to reproduce the study are available here: https://doi.org/10.7910/DVN/EJYWCK. Acknowledgments: We are grateful to the participants of several workshops and colloquiums at the European University Institute (May 2019), ISPP Lisbon (July 2019), Massachusetts Institute of Technology (MIT) (October 2019) and New York University (December 2019) for feedback. We are especially grateful to David Rand, Gordon Pennycook, Joshua Tucker, John Jost, Johanna Gereke, Wiebke Drews, Simon Munzert, Max Schaub, Anselm	Hager, Tobias	Widmann, Christian Clemm, Florian	Keusch, Matthias Fatke, Kathrin	Ackermann, Christine	Emmer, Nan Zhang and several anonymous reviewers for excellent feedback.]\vspace{1cm}
  |   
date: |
  |       
  |
  | Published version (*Political Communication*): http://dx.doi.org/10.1080/10584609.2020.1840462
  |
abstract: \noindent\setstretch{1}The increasing spread of false stories (“fake news”) represents one of the great challenges societies face in the 21st century. A little-understood aspect of this phenomenon and of the processing of online news in general is how sources influence whether people believe and share what they read. In contrast to the predigital era, the Internet makes it easy for anyone to imitate well-known and credible sources in name and appearance. In a preregistered survey experiment, we first investigate the effect of this contrast (real vs. fake source) and find that subjects, as expected, have a higher tendency to believe and a somewhat higher propensity to share news by real sources. We then expose subjects to a number of reports manipulated in content (congruent vs. incongruent with individuals' attitudes), which reveals our most crucial finding. As predicted, people are more likely to believe a news report by a source that has previously given them congruent information. However, this only holds if the source is fake. We further use machine learning to uncover treatment heterogeneity. Effects vary most strongly for different levels of trust in the mainstream media and having voted for the populist right.\vspace{.8cm}
colorlinks: true
output: 
  bookdown::pdf_document2:
    toc: no
    keep_tex: true
    pandoc_args: --lua-filter=multiple-bibliographies.lua
    latex_engine: xelatex
geometry: [top=0.85in,left=1in,right=1in, footskip=0.40in]
fontsize: [12pt,letterpaper]
linestretch: 1
documentclass: article
header-includes:
   - \usepackage{dcolumn}
   - \usepackage{color}
   - \usepackage{floatrow}
   - \floatsetup[figure]{capposition=top}
   - \floatsetup[table]{capposition=top}
   - \floatplacement{figure}{H}
   - \floatplacement{table}{H}
   - \usepackage{booktabs}
   - \usepackage{longtable}
   - \usepackage{array}
   - \usepackage{multirow}
   - \usepackage{wrapfig}
   - \usepackage{float}
   - \usepackage{colortbl}
   - \usepackage{pdflscape}
   - \usepackage{tabu}
   - \usepackage{threeparttable}
   - \usepackage{threeparttablex}
   - \usepackage[normalem]{ulem}
   - \usepackage{makecell}
   - \usepackage[hang]{footmisc}
   - \usepackage{setspace}
   - \usepackage{multibib}
csl: chicago-author-date.csl
bibliography_main:
  - partial_bib_1.bib
bibliography_app:
  - partial_bib_2.bib
link-citations: yes
linkcolor: blue
---

```{r setup, include=FALSE}

knitr::opts_chunk$set(
  cache = FALSE,
  echo = FALSE,
  concordance = TRUE,
  message = FALSE,
  warning = FALSE,
  fig.pos = "H"
)
options(kableExtra.latex.load_packages = FALSE) # dont load latex packages

# Packages
library(pacman)

pacman::p_load(
  tidyverse,
  magrittr,
  broom,
  ggpubr,
  grf,
  gridExtra,
  kableExtra,
  ggcorrplot,
  stargazer,
  gtrendsR,
  lubridate,
  ggnewscale,
  DiagrammeRsvg
)


# Custom functions: Add alpha to colors
add.alpha <- function(col, alpha = 1) {
  if (missing(col)) {
    stop("Please provide a vector of colors.")
  }
  as.character(apply(
    sapply(col, col2rgb) / 255, 2,
    function(x) {
      rgb(x[1], x[2], x[3], alpha = alpha)
    }
  ))
}




```


```{r data-import, paged.print=FALSE}

# Delete information that could deanonymize respondents (open-ended answers)
# data <- read_csv("input/data_final.csv")
# data$Q123[842] <- gsub("xxxxx@xxxx.xx", "", data$Q123[842]) # replace emailadress
# data$Q123[701]  <- gsub("XXXXXX", "", data$Q123[701]) # replace website
# write_csv(data, "input/data_final.csv")


data <- read_csv("input/data_final.csv") # impor data
data <- data[-1:-2, ] # Delete rows related to variable names
data <- data[!is.na(data$tic), ] # Exclude one person without ID, aborted survey and was not deleted after pre-launch

# Completion ####

## Who completed, was screened out of, speeded or aborted survey?

data <- data %>% rename(completion = Q_TerminateFlag)

# table(data$completion, exclude = NULL)
# table(data$completion[data$completion == "Screened" & data$QID731_1 < 18])
# table(data$QID213, exclude = NULL)

# 3168 observations of which
# 1063 tagged, of which
# 308 speeder (312 "Screened" minus 4 under 18 years)
# 4 under 18 years
# 751 because of full quota
# 2105 not screened out
# 2288 observations that confirmed debrief, of which
# 308 speeder
# 1980 valid completes, which equals sample target size
# Difference between 2105 and 1980 presumably subjects that aborted survey

## Recode Completion variable

data <- data %>%
  mutate(
    completion =
      case_when(
        completion == "Screened" & QID731_1 < 18 ~ "Under 18",
        completion == "Screened" ~ "Speeder",
        is.na(completion) & is.na(QID213) ~ "Aborted",
        is.na(completion) & !is.na(QID213) ~ "Complete",
        TRUE ~ as.character(completion)
      )
  )
```

```{r data-covariates, paged.print=FALSE}

# Consent ####

data <- data %>% rename(consent = QID212)

# Sex ####

data <- data %>% rename(sex = QID160)
data$sex_fac <- as.factor(data$sex)
data$sex_num <- as.numeric(data$sex_fac)
data$sex_num[data$sex_num == 2] <- 0

# Age and age group  ####

data <- data %>% rename(age = QID731_1)
data$age <- as.numeric(data$age)
data <- data %>% mutate(age_cat = cut(age,
  breaks = c(0, 30.5, 43.5, 56.5, 69.5, 120),
  labels = c(
    "18-30ys",
    "31-43ys",
    "44-56ys",
    "57-69ys",
    "70+"
  )
))

# Federal state   ####

data <- data %>% rename(federal_state = QID222_1)

## Federal state west (Berlin = West)

data <- data %>% mutate(federal_state_west = ifelse(federal_state == "Brandenburg" |
  federal_state == "Mecklenburg-Vorpommern" |
  federal_state == "Sachsen" |
  federal_state == "Sachsen-Anhalt" |
  federal_state == "Thueringen", 0, 1))

# Income ####

data <- data %>%
  rename(income = QID208) %>%
  mutate(income = gsub("Monatlich |Monatich ", "", .$income))

## Create english version

## Create english version
data$income_fac <- factor(data$income,
  ordered = TRUE,
  levels = c(
    "0 bis 500 Euro",
    "501 bis 1.000 Euro",
    "1.001 bis 1.500 Euro",
    "1.501 bis 2.000 Euro",
    "2.001 bis 2.500 Euro",
    "2.501 bis 3.000 Euro",
    "3.001 bis 3.500 Euro",
    "3.501 bis 4.000 Euro",
    "4.001 bis 4.500 Euro",
    "4.501 bis 5.000 Euro",
    "5.001 Euro oder mehr"
  )
)
levels(data$income_fac) <- gsub("bis", "to", levels(data$income_fac))
levels(data$income_fac) <- gsub("oder mehr", "or more", levels(data$income_fac))
levels(data$income_fac) <- gsub(" Euro", "€", levels(data$income_fac))
levels(data$income_fac) <- gsub("\\.", ",", levels(data$income_fac))

## Numeric version
data$income_num <- as.numeric(data$income_fac)

# Education ####

data <- data %>%
  rename(education = QID220) %>%
  mutate(
    education =
      case_when(
        education == "Fach-/Hochschulstudium"
        ~ "Fachhochschule or Hochschule",
        education == "Abitur, allgemeine oder fachgebundene Hochschulreife (bzw. Erweiterte Oberschule der ehem. DDR mit Abschluss 12. Klasse)"
        ~ "Abitur",
        education == "Mittlere Reife, Realschulabschluss, Fachoberschulreife, Mittlerer Schulabschluss (bzw. Polytechnische Oberschule der ehem. DDR mit Abschluss der 10. Klasse)"
        ~ "Mittlere Reife, Realschulabschluss, Fachoberschulreife",
        education == "Volks- oder Hauptschulabschluss (bzw. Polytechnische Oberschule der ehem. DDR mit Abschluss der 8. oder 9. Klasse)"
        ~ "Finished Volksschule or Hauptschule",
        education == "Fachhochschulreife (Abschluss einer Fachoberschule etc.)"
        ~ "Fachhochschulreife",
        education == "(Noch) kein Abschluss, aber Grundschule beendet"
        ~ "Finished Grundschule",
        education == "Grundschule nicht beendet" ~ "None"
      )
  ) %>%
  mutate(education = factor(education,
    ordered = TRUE,
    levels = c(
      "None",
      "Finished Grundschule",
      "Finished Volksschule or Hauptschule",
      "Mittlere Reife, Realschulabschluss, Fachoberschulreife",
      "Fachhochschulreife",
      "Abitur",
      "Fachhochschule or Hochschule"
    )
  ))


## Another english version
# "Grundschule nicht beendet" and "(Noch) kein Abschluss, aber Grundschule beendet" as NA

data <- data %>%
  mutate(
    education_fac =
      case_when(
        education == "Fachhochschule or Hochschule"
        ~ "University",
        education == "Abitur"
        ~ "High school",
        education == "Fachhochschulreife" ~ "Tech. high school",
        education == "Mittlere Reife, Realschulabschluss, Fachoberschulreife"
        ~ "Middle school",
        education == "Finished Volksschule or Hauptschule" ~ "Elem. school",
        education == "Finished Grundschule" | education == "None"
        ~ ""
      )
  ) %>%
  mutate(education_fac = factor(education_fac,
    ordered = TRUE,
    levels = c(
      "Elem. school",
      "Middle school",
      "Tech. high school",
      "High school",
      "University"
    )
  ))

## Numeric version

data$education_num <- as.numeric(data$education_fac)

# Turnout ####

data <- data %>%
  rename(turnout = QID221) %>%
  mutate(turnout_rec = case_when(
    turnout == "Ich war nicht wahlberechtigt" ~ "Nein",
    turnout == "Weiss ich nicht" ~ "",
    TRUE ~ as.character(turnout)
  )) %>%
  mutate(turnout_fac = factor(turnout_rec,
    ordered = TRUE,
    levels = c("Nein", "Ja")
  )) %>%
  mutate(turnout_num = as.numeric(turnout_fac))


# Vote choice ####

data <- data %>% rename(vote_choice = QID219)
data$vote_choice_fac <- factor(data$vote_choice)
data <- data %>% rename(vote_hyp_1 = QID754)
data$vote_hyp_1_fac <- factor(data$vote_hyp_1)
data <- data %>% rename(vote_hyp_2 = QID756)
data$vote_hyp_2_fac <- factor(data$vote_hyp_2)

data <- data %>%
  unite("vote_choice_all",
    c(vote_choice, vote_hyp_1, vote_hyp_2),
    sep = "_", remove = FALSE
  ) %>%
  mutate(vote_choice_all = gsub("NA_|_NA", "", vote_choice_all))

data$vote_choice_all_fac <- factor(data$vote_choice_all,
  ordered = TRUE,
  levels = c(
    "Die Linke",
    "Buendnis 90/Die Gruenen",
    "SPD",
    "CDU/CSU",
    "FDP",
    "AfD",
    "Weiss ich nicht",
    NA
  )
)
data$vote_choice_all_num <- as.numeric(data$vote_choice_all_fac)
levels(data$vote_choice_all_fac) <- c(
  "LeftParty",
  "Greens",
  "SPD",
  "CDU/CSU",
  "FDP",
  "AfD",
  "Dont know"
)

# Create party dummies

data$vote_choice_leftparty <- ifelse(data$vote_choice_all_fac == "LeftParty", 1, 0)
data$vote_choice_greens <- ifelse(data$vote_choice_all_fac == "Greens", 1, 0)
data$vote_choice_spd <- ifelse(data$vote_choice_all_fac == "SPD", 1, 0)
data$vote_choice_cdu_csu <- ifelse(data$vote_choice_all_fac == "CDU/CSU", 1, 0)
data$vote_choice_fdp <- ifelse(data$vote_choice_all_fac == "FDP", 1, 0)
data$vote_choice_afd_num <- ifelse(data$vote_choice_all_fac == "AfD", 1, 0)
data$vote_choice_dont_know <- ifelse(data$vote_choice_all_fac == "Dont know", 1, 0)

data$vote_choice_leftparty_fac <- factor(data$vote_choice_leftparty, levels = c(0, 1), labels = c("Other", "Yes"))
data$vote_choice_greens_fac <- factor(data$vote_choice_greens, levels = c(0, 1), labels = c("Other", "Yes"))
data$vote_choice_spd_fac <- factor(data$vote_choice_spd, levels = c(0, 1), labels = c("Other", "Yes"))
data$vote_choice_cdu_csu_fac <- factor(data$vote_choice_cdu_csu, levels = c(0, 1), labels = c("Other", "Yes"))
data$vote_choice_fdp_fac <- factor(data$vote_choice_fdp, levels = c(0, 1), labels = c("Other", "Yes"))
data$vote_choice_afd_fac <- factor(data$vote_choice_afd_num, levels = c(0, 1), labels = c("Other", "Yes"))
data$vote_choice_dont_know_fac <- factor(data$vote_choice_dont_know, levels = c(0, 1), labels = c("Other", "Yes"))

# HTML knowledge ####

data <- data %>% rename(know_html = QID295)
data$know_html_fac <- factor(data$know_html)
data <- data %>% mutate(know_html_correct = ifelse(know_html == "HTML", 1, 0))

# Debriefing ####

## Confirmation
data <- data %>% rename(debrief_more_info = Q127_1)
data <- data %>% mutate(debrief_more_info_num = ifelse(is.na(debrief_more_info), 0, 1))

## Open feedback
data <- data %>% rename(debrief_feedback = Q123)
data <- data %>% rename(debrief_confirm = QID213)
data <- data %>% mutate(debrief_confirm_num = ifelse(is.na(debrief_confirm), 0, 1))

# Source knowing, reading, trusting ####

sources <- c(
  "tagesschau",
  "heute",
  "sz",
  "faz",
  "focus",
  "bild",
  "nachrichten360",
  "berliner",
  "spiegel",
  "rtdeutsch",
  "newsblitz"
)

## Recognize source ####

data <- data %>% rename(
  know_tagesschau = QID189_1,
  know_heute = QID189_2,
  know_sz = QID189_3,
  know_faz = QID189_4,
  know_focus = QID189_5,
  know_bild = QID189_6,
  know_nachrichten360 = QID189_7,
  know_berliner = QID189_8,
  know_spiegel = QID189_9,
  know_rtdeutsch = QID189_10,
  know_newsblitz = QID189_11
)

for (source in sources) {
  know_char <- paste("know_", source, sep = "")
  know_fac <- paste("know_", source, "_fac", sep = "")
  know_num <- paste("know_", source, "_num", sep = "")
  know_char_vec <- data %>% pull(know_char)
  data <- data %>% mutate(!!know_fac := factor(know_char_vec))
  data <- data %>% mutate(!!know_num := ifelse(know_char_vec == "Ja", 1, 0))
  rm(know_char, know_fac, know_num, know_char_vec, source)
}

## Read/watched source ####

data <- data %>% rename(
  read_tagesschau = QID712_1,
  read_heute = QID712_2,
  read_sz = QID712_3,
  read_faz = QID712_4,
  read_focus = QID712_5,
  read_bild = QID712_6,
  read_nachrichten360 = QID712_7,
  read_berliner = QID712_8,
  read_spiegel = QID712_9,
  read_rtdeutsch = QID712_10,
  read_newsblitz = QID712_11
)

for (source in sources) {
  read_char <- paste("read_", source, sep = "")
  read_fac <- paste("read_", source, "_fac", sep = "")
  read_num <- paste("read_", source, "_num", sep = "")
  read_char_vec <- data %>% pull(read_char)
  data <- data %>% mutate(!!read_fac := factor(read_char_vec))
  data <- data %>% mutate(!!read_num := ifelse(read_char_vec == "Ja", 1, 0))
  rm(source, read_char, read_fac, read_num, read_char_vec)
}

## Trust source ####

data <- data %>% rename(
  trust_tagesschau = Q126,
  trust_heute = Q128,
  trust_sz = Q129,
  trust_faz = Q130,
  trust_focus = Q131,
  trust_bild = Q132,
  trust_nachrichten360 = Q133,
  trust_berliner = Q134,
  trust_spiegel = Q135,
  trust_rtdeutsch = Q136,
  trust_newsblitz = Q137
)

for (source in sources) {
  trust_char <- paste("trust_", source, sep = "")
  trust_fac <- paste("trust_", source, "_fac", sep = "")
  trust_num <- paste("trust_", source, "_num", sep = "")
  trust_char_vec <- data %>% pull(trust_char)
  data <- data %>% mutate(!!trust_fac := factor(trust_char_vec,
    ordered = T,
    levels = c(
      "Ueberhaupt nicht",
      "Eher nicht",
      "Teils/teils",
      "Eher",
      "Voll und ganz"
    )
  ))
  data <- data %>% mutate(!!trust_num := case_when(
    trust_char_vec == "Ueberhaupt nicht" ~ 0,
    trust_char_vec == "Eher nicht" ~ 1,
    trust_char_vec == "Teils/teils" ~ 2,
    trust_char_vec == "Eher" ~ 3,
    trust_char_vec == "Voll und ganz" ~ 4
  ))
  rm(trust_char, trust_fac, trust_num, trust_char_vec, source)
}

## Trust source if source is known
### Numeric, equals trust when subject knows the source and NA if he/she doesn't

for (source in sources) {
  trust <- paste("trust_", source, "_num", sep = "")
  know <- paste("know_", source, "_num", sep = "")
  trust_known <- paste("trust_", source, "_known_num", sep = "")
  trust_vec <- data %>% pull(trust)
  know_vec <- data %>% pull(know)
  data <- data %>% mutate(!!trust_known := ifelse(know_vec == 1, trust_vec, NA))
  rm(source, trust, know, trust_known, trust_vec, know_vec)
}

# Mainstream source knowledge index ####

know_sources_mainstream <- c(
  "know_tagesschau_num",
  "know_heute_num",
  "know_sz_num",
  "know_faz_num",
  "know_focus_num",
  "know_bild_num",
  "know_spiegel_num"
)

data <- data %>%
  ungroup() %>%
  mutate(
    know_source_mainstream =
      select(., know_sources_mainstream) %>%
        rowMeans(na.rm = TRUE)
  )

# Mainstream trust knowledge index ####

trust_sources_mainstream <- c(
  "trust_tagesschau_num",
  "trust_heute_num",
  "trust_sz_num",
  "trust_faz_num",
  "trust_focus_num",
  "trust_bild_num",
  "trust_spiegel_num"
)

data <- data %>%
  ungroup() %>%
  mutate(
    trust_source_mainstream =
      select(., trust_sources_mainstream) %>%
        rowMeans(na.rm = TRUE)
  )

# Usage: email, Facebook, Twitter, Whatsapp ####

services <- c(
  "email",
  "fb",
  "twitter",
  "whatsapp"
)

## Service account

data <- data %>% rename(
  account_email = QID634,
  account_fb = QID635,
  account_twitter = QID637,
  account_whatsapp = QID636
)

for (i in services) {
  account <- paste("account_", i, sep = "")
  account_fac <- paste("account_", i, "_fac", sep = "")
  account_vec <- data %>% pull(account)
  data <- data %>% mutate(!!account_fac := factor(account_vec,
    levels = c(
      "Nein",
      "Ja, aber ich nutze es nicht",
      "Ja"
    )
  ))
  rm(account, account_fac, account_vec)
}
levels(data$account_email_fac) <- c("No", "Yes, but I dont use it", "Yes")
levels(data$account_fb_fac) <- c("No", "Yes, but I dont use it", "Yes")
levels(data$account_twitter_fac) <- c("No", "Yes, but I dont use it", "Yes")
levels(data$account_whatsapp_fac) <- c("No", "Yes, but I dont use it", "Yes")

## Dummy

data <- data %>%
  mutate(use_email = case_when(
    account_email == "Ja, aber ich nutze es nicht" ~ "Nein",
    TRUE ~ as.character(account_email)
  ))
data$use_email_fac <- factor(data$use_email,
  ordered = TRUE,
  levels = c("Nein", "Ja")
)
data$use_email_num <- as.numeric(data$use_email_fac)

data <- data %>% mutate(use_fb = case_when(
  account_fb == "Ja, aber ich nutze es nicht" ~ "Nein",
  TRUE ~ as.character(account_fb)
))
data$use_fb_fac <- factor(data$use_fb,
  ordered = TRUE,
  levels = c("Nein", "Ja")
)
data$use_fb_num <- as.numeric(data$use_fb_fac)

data <- data %>% mutate(use_twitter = case_when(
  account_twitter == "Ja, aber ich nutze es nicht" ~ "Nein",
  TRUE ~ as.character(account_twitter)
))
data$use_twitter_fac <- factor(data$use_twitter,
  ordered = TRUE,
  levels = c("Nein", "Ja")
)
data$use_twitter_num <- as.numeric(data$use_twitter_fac)

data <- data %>% mutate(use_whatsapp = case_when(
  account_whatsapp == "Ja, aber ich nutze es nicht" ~ "Nein",
  TRUE ~ as.character(account_whatsapp)
))
data$use_whatsapp_fac <- factor(data$use_whatsapp,
  ordered = TRUE,
  levels = c("Nein", "Ja")
)
data$use_whatsapp_num <- as.numeric(data$use_whatsapp_fac)

# Sharing frequency ####

data <- data %>% rename(sharing_email = QID744)
data <- data %>% rename(sharing_fb = QID745)
data <- data %>% rename(sharing_twitter = QID746)
data <- data %>% rename(sharing_whatsapp = QID747)

for (i in services) {
  sharing <- paste("sharing_", i, sep = "")
  sharing_fac <- paste("sharing_", i, "_fac", sep = "")
  sharing_num <- paste("sharing_", i, "_num", sep = "")
  sharing_vec <- data %>% pull(sharing)
  data <- data %>% mutate(!!sharing_fac := factor(sharing_vec,
    ordered = T,
    levels = c(
      "Seltener",
      "Ein paar Mal im Jahr",
      "Ein paar Mal im Monat",
      "Einmal pro Woche",
      "Taeglich"
    )
  ))
  data[[sharing_num]] <- as.numeric(data[[sharing_fac]])
  data[[sharing_num]] <- data[[sharing_num]] - 1
  rm(i, sharing, sharing_fac, sharing_num, sharing_vec)
}

# Composite index

data <- data %>% mutate(
  sharing_frequency =
    select(., c(
      "sharing_email_num",
      "sharing_fb_num",
      "sharing_twitter_num",
      "sharing_whatsapp_num"
    )) %>% rowMeans(na.rm = TRUE)
)


## Political Knowledge ####

politicians <- c(
  "merkel",
  "altmaier",
  "schulz",
  "maas",
  "lindner",
  "hofreiter",
  "goering",
  "bartsch",
  "weidel"
)

data <- data %>% rename(
  know_merkel = QID721_1,
  know_altmaier = Q131_1,
  know_schulz = Q139_1,
  know_maas = Q135_1,
  know_lindner = Q137_1,
  know_hofreiter = Q141_1,
  know_goering = Q143_1,
  know_bartsch = Q145_1,
  know_weidel = Q147_1
)

data <- data %>%
  mutate(know_merkel_correct = ifelse(know_merkel == "CDU", 1, 0)) %>%
  mutate(know_altmaier_correct = ifelse(know_altmaier == "CDU", 1, 0)) %>%
  mutate(know_schulz_correct = ifelse(know_schulz == "SPD", 1, 0)) %>%
  mutate(know_maas_correct = ifelse(know_maas == "SPD", 1, 0)) %>%
  mutate(know_lindner_correct = ifelse(know_lindner == "FDP", 1, 0)) %>%
  mutate(know_hofreiter_correct = ifelse(know_hofreiter == "Die Gruenen", 1, 0)) %>%
  mutate(know_goering_correct = ifelse(know_goering == "Die Gruenen", 1, 0)) %>%
  mutate(know_bartsch_correct = ifelse(know_bartsch == "Die Linke", 1, 0)) %>%
  mutate(know_weidel_correct = ifelse(know_weidel == "AfD", 1, 0))

data$know_politicians_total <- rowSums(data[, grep("correct", names(data), value = T)],
  na.rm = T
)

## Overconfidence ####

data <- data %>% rename(confidence = QID734)
data$confidence <- as.numeric(data$confidence)
data$overconfidence <- (data$confidence - data$know_politicians_total)
```

```{r data-attitudes, paged.print=FALSE}

issues <- c(
  "culture",
  "economy",
  "security",
  "life",
  "border"
)

# Single items ####
## Low value is an anti-migration position, high value a pro-migration position

data <- data %>% rename(
  immigrant_culture = QID183,
  immigrant_economy = QID184,
  immigrant_security = QID185,
  immigrant_life = QID186,
  immigrant_border = QID218
)

for (issue in issues) {
  attitude <- paste("immigrant_", issue, sep = "")
  attitude_num <- paste("immigrant_", issue, "_num", sep = "")
  attitude_vec <- data %>% pull(attitude)
  data <- data %>% mutate(!!attitude_num :=
    case_when(
      grepl("^0\\s+", attitude_vec) == TRUE ~ "0",
      grepl("^10\\s+", attitude_vec) == TRUE ~ "10",
      TRUE ~ as.character(attitude_vec)
    ))
  data[[attitude_num]] <- as.numeric(data[[attitude_num]])
  rm(issue, attitude, attitude_num, attitude_vec)
}

# Composite indices ####

## Migration attitude index: Average of all six migration attitudes
# Low value is an anti-migration position, high value a pro-migration position

data <- data %>%
  rowwise() %>%
  mutate(migration_attitude_average = mean(c(
    immigrant_culture_num,
    immigrant_economy_num,
    immigrant_security_num,
    immigrant_life_num,
    immigrant_border_num
  ), na.rm = TRUE))

# Cronbach's alpha ####

cronbachs_alpha <- data %>%
  dplyr::select(intersect(
    starts_with("immigrant"),
    ends_with("num")
  )) %>%
  as.data.frame() %>%
  psych::alpha()

# Migration attitude dummies ####
## cutoffs depend on which observations are included in the data set
## For the main analyses, define subset as only "Completes"


## Define subset
subset <- "Complete"

cutoffs <- data %>%
  group_by(completion) %>%
  summarize(
    mean = mean(migration_attitude_average, na.rm = TRUE),
    lower = quantile(migration_attitude_average, .3333, na.rm = TRUE),
    upper = quantile(migration_attitude_average, .6666, na.rm = TRUE)
  ) %>%
  filter(completion == !!subset)

## Migration attitude dummy 50: 1 if above mean on migration index
data <- data %>% mutate(
  migration_attitude_dummy_50 =
    ifelse(migration_attitude_average > cutoffs$mean, 1, 0)
)

## Migration attitude dummy 30: 1 if in upper tertile on migration index, 0 if in lower tertile
data <- data %>% mutate(
  migration_attitude_dummy_30 =
    case_when(
      migration_attitude_average <= cutoffs$lower ~ 0,
      migration_attitude_average >= cutoffs$upper ~ 1
    )
)
```

```{r data-outcomes, paged.print=FALSE}

# Belief and sharing report 1 ####
colnames(data)[colnames(data) == "QID640"] <- "belief_report_1_desk"
colnames(data)[colnames(data) == "QID641_1"] <- "share_report_1_email_desk"
colnames(data)[colnames(data) == "QID641_2"] <- "share_report_1_fb_desk"
colnames(data)[colnames(data) == "QID641_3"] <- "share_report_1_twitter_desk"
colnames(data)[colnames(data) == "QID641_4"] <- "share_report_1_whatsapp_desk"

colnames(data)[colnames(data) == "QID738"] <- "belief_report_1_mob"
colnames(data)[colnames(data) == "QID646_1"] <- "share_report_1_email_mob"
colnames(data)[colnames(data) == "QID646_2"] <- "share_report_1_fb_mob"
colnames(data)[colnames(data) == "QID646_3"] <- "share_report_1_twitter_mob"
colnames(data)[colnames(data) == "QID646_4"] <- "share_report_1_whatsapp_mob"

data <- tidyr::unite(data, "report_1_belief", c("belief_report_1_desk", "belief_report_1_mob"))
data$report_1_belief <- gsub("NA_|_NA", "", data$report_1_belief)
data$belief_report_1_fac <- factor(data$report_1_belief,
  levels = c(
    "0 - Gar nicht",
    "1", "2", "3", "4", "5",
    "6 - Voll und ganz"
  ),
  ordered = TRUE
)
data$belief_report_1_num <- data$report_1_belief
data$belief_report_1_num[data$belief_report_1_num == "0 - Gar nicht"] <- 0
data$belief_report_1_num[data$belief_report_1_num == "6 - Voll und ganz"] <- 6
data$belief_report_1_num <- as.numeric(data$belief_report_1_num)

data <- tidyr::unite(
  data, "share_report_1_email",
  c(
    "share_report_1_email_desk",
    "share_report_1_email_mob"
  )
)
data$share_report_1_email <- gsub("NA_|_NA", "", data$share_report_1_email)
data$share_report_1_email_fac <- factor(data$share_report_1_email)
data$share_report_1_email_num <- NA
data$share_report_1_email_num[data$share_report_1_email == "Nein"] <- 0
data$share_report_1_email_num[data$share_report_1_email == "Ja"] <- 1

data <- tidyr::unite(
  data, "share_report_1_fb",
  c(
    "share_report_1_fb_desk",
    "share_report_1_fb_mob"
  )
)
data$share_report_1_fb <- gsub("NA_|_NA", "", data$share_report_1_fb)
data$share_report_1_fb_fac <- factor(data$share_report_1_fb)
data$share_report_1_fb_num <- NA
data$share_report_1_fb_num[data$share_report_1_fb == "Nein"] <- 0
data$share_report_1_fb_num[data$share_report_1_fb == "Ja"] <- 1

data <- tidyr::unite(
  data, "share_report_1_twitter",
  c("share_report_1_twitter_desk", "share_report_1_twitter_mob")
)
data$share_report_1_twitter <- gsub("NA_|_NA", "", data$share_report_1_twitter)
data$share_report_1_twitter_fac <- factor(data$share_report_1_twitter)
data$share_report_1_twitter_num <- NA
data$share_report_1_twitter_num[data$share_report_1_twitter == "Nein"] <- 0
data$share_report_1_twitter_num[data$share_report_1_twitter == "Ja"] <- 1

data <- tidyr::unite(
  data, "share_report_1_whatsapp",
  c("share_report_1_whatsapp_desk", "share_report_1_whatsapp_mob")
)
data$share_report_1_whatsapp <- gsub("NA_|_NA", "", data$share_report_1_whatsapp)
data$share_report_1_whatsapp_fac <- factor(data$share_report_1_whatsapp)
data$share_report_1_whatsapp_num <- NA
data$share_report_1_whatsapp_num[data$share_report_1_whatsapp == "Nein"] <- 0
data$share_report_1_whatsapp_num[data$share_report_1_whatsapp == "Ja"] <- 1

# Sharing report 2 ####
# Only relevant for Figure A8

colnames(data)[colnames(data) == "QID758_1"] <- "share_report_2_email_desk"
colnames(data)[colnames(data) == "QID758_2"] <- "share_report_2_fb_desk"
colnames(data)[colnames(data) == "QID758_3"] <- "share_report_2_twitter_desk"
colnames(data)[colnames(data) == "QID758_4"] <- "share_report_2_whatsapp_desk"
colnames(data)[colnames(data) == "QID759_1"] <- "share_report_2_email_mob"
colnames(data)[colnames(data) == "QID759_2"] <- "share_report_2_fb_mob"
colnames(data)[colnames(data) == "QID759_3"] <- "share_report_2_twitter_mob"
colnames(data)[colnames(data) == "QID759_4"] <- "share_report_2_whatsapp_mob"

data <- tidyr::unite(
  data, "share_report_2_email",
  c("share_report_2_email_desk", "share_report_2_email_mob")
)
data$share_report_2_email <- gsub("NA_|_NA", "", data$share_report_2_email)
data$share_report_2_email_fac <- factor(data$share_report_2_email)
data$share_report_2_email_num <- NA
data$share_report_2_email_num[data$share_report_2_email == "Nein"] <- 0
data$share_report_2_email_num[data$share_report_2_email == "Ja"] <- 1

data <- tidyr::unite(
  data, "share_report_2_fb",
  c(
    "share_report_2_fb_desk",
    "share_report_2_fb_mob"
  )
)
data$share_report_2_fb <- gsub("NA_|_NA", "", data$share_report_2_fb)
data$share_report_2_fb_fac <- factor(data$share_report_2_fb)
data$share_report_2_fb_num <- NA
data$share_report_2_fb_num[data$share_report_2_fb == "Nein"] <- 0
data$share_report_2_fb_num[data$share_report_2_fb == "Ja"] <- 1

data <- tidyr::unite(
  data, "share_report_2_twitter",
  c(
    "share_report_2_twitter_desk",
    "share_report_2_twitter_mob"
  )
)
data$share_report_2_twitter <- gsub("NA_|_NA", "", data$share_report_2_twitter)
data$share_report_2_twitter_fac <- factor(data$share_report_2_twitter)
data$share_report_2_twitter_num <- NA
data$share_report_2_twitter_num[data$share_report_2_twitter == "Nein"] <- 0
data$share_report_2_twitter_num[data$share_report_2_twitter == "Ja"] <- 1

data <- tidyr::unite(
  data, "share_report_2_whatsapp",
  c(
    "share_report_2_whatsapp_desk",
    "share_report_2_whatsapp_mob"
  )
)
data$share_report_2_whatsapp <- gsub("NA_|_NA", "", data$share_report_2_whatsapp)
data$share_report_2_whatsapp_fac <- factor(data$share_report_2_whatsapp)
data$share_report_2_whatsapp_num <- NA
data$share_report_2_whatsapp_num[data$share_report_2_whatsapp == "Nein"] <- 0
data$share_report_2_whatsapp_num[data$share_report_2_whatsapp == "Ja"] <- 1

# Belief and sharing report 5 ####

colnames(data)[colnames(data) == "QID739"] <- "belief_report_5_desk"
colnames(data)[colnames(data) == "QID650_1"] <- "share_report_5_email_desk"
colnames(data)[colnames(data) == "QID650_2"] <- "share_report_5_fb_desk"
colnames(data)[colnames(data) == "QID650_3"] <- "share_report_5_twitter_desk"
colnames(data)[colnames(data) == "QID650_4"] <- "share_report_5_whatsapp_desk"
colnames(data)[colnames(data) == "QID740"] <- "belief_report_5_mob"
colnames(data)[colnames(data) == "QID679_1"] <- "share_report_5_email_mob"
colnames(data)[colnames(data) == "QID679_2"] <- "share_report_5_fb_mob"
colnames(data)[colnames(data) == "QID679_3"] <- "share_report_5_twitter_mob"
colnames(data)[colnames(data) == "QID679_4"] <- "share_report_5_whatsapp_mob"

data <- tidyr::unite(data, "belief_report_5", c("belief_report_5_desk", "belief_report_5_mob"))
data$belief_report_5 <- gsub("NA_|_NA", "", data$belief_report_5)
data$belief_report_5_fac <- factor(data$belief_report_5,
  levels = c(
    "0 - Gar nicht", "1", "2", "3", "4", "5",
    "6 - Voll und ganz"
  ),
  ordered = TRUE
)
data$belief_report_5_num <- data$belief_report_5
data$belief_report_5_num[data$belief_report_5_num == "0 - Gar nicht"] <- 0
data$belief_report_5_num[data$belief_report_5_num == "6 - Voll und ganz"] <- 6
data$belief_report_5_num <- as.numeric(data$belief_report_5_num)

data <- tidyr::unite(
  data, "share_report_5_email",
  c("share_report_5_email_desk", "share_report_5_email_mob")
)
data$share_report_5_email <- gsub("NA_|_NA", "", data$share_report_5_email)
data$share_report_5_email_fac <- factor(data$share_report_5_email)
data$share_report_5_email_num <- NA
data$share_report_5_email_num[data$share_report_5_email == "Nein"] <- 0
data$share_report_5_email_num[data$share_report_5_email == "Ja"] <- 1

data <- tidyr::unite(
  data, "share_report_5_fb",
  c("share_report_5_fb_desk", "share_report_5_fb_mob")
)
data$share_report_5_fb <- gsub("NA_|_NA", "", data$share_report_5_fb)
data$share_report_5_fb_fac <- factor(data$share_report_5_fb)
data$share_report_5_fb_num <- NA
data$share_report_5_fb_num[data$share_report_5_fb == "Nein"] <- 0
data$share_report_5_fb_num[data$share_report_5_fb == "Ja"] <- 1

data <- tidyr::unite(
  data, "share_report_5_twitter",
  c("share_report_5_twitter_desk", "share_report_5_twitter_mob")
)
data$share_report_5_twitter <- gsub("NA_|_NA", "", data$share_report_5_twitter)
data$share_report_5_twitter_fac <- factor(data$share_report_5_twitter)
data$share_report_5_twitter_num <- NA
data$share_report_5_twitter_num[data$share_report_5_twitter == "Nein"] <- 0
data$share_report_5_twitter_num[data$share_report_5_twitter == "Ja"] <- 1

data <- tidyr::unite(
  data, "share_report_5_whatsapp",
  c("share_report_5_whatsapp_desk", "share_report_5_whatsapp_mob")
)
data$share_report_5_whatsapp <- gsub("NA_|_NA", "", data$share_report_5_whatsapp)
data$share_report_5_whatsapp_fac <- factor(data$share_report_5_whatsapp)
data$share_report_5_whatsapp_num <- NA
data$share_report_5_whatsapp_num[data$share_report_5_whatsapp == "Nein"] <- 0
data$share_report_5_whatsapp_num[data$share_report_5_whatsapp == "Ja"] <- 1


```

```{r data-treatment, paged.print=FALSE}

# Raw treatments ####

colnames(data)[colnames(data) == "FL_358_DO_FL_359"] <- "treatment_a"
colnames(data)[colnames(data) == "FL_358_DO_FL_384"] <- "treatment_b"
colnames(data)[colnames(data) == "FL_358_DO_FL_419"] <- "treatment_c"
colnames(data)[colnames(data) == "FL_358_DO_FL_454"] <- "treatment_d"
colnames(data)[colnames(data) == "FL_358_DO_FL_489"] <- "treatment_e"
colnames(data)[colnames(data) == "FL_358_DO_FL_524"] <- "treatment_f"
colnames(data)[colnames(data) == "FL_358_DO_FL_559"] <- "treatment_g"
colnames(data)[colnames(data) == "FL_358_DO_FL_594"] <- "treatment_h"

## Source treatment ####
# 0 is "Nachrichten 360", 1 is "Tagesschau*

data$treatment_source <- NA
data$treatment_source[data$treatment_e == 1 |
  data$treatment_f == 1 |
  data$treatment_g == 1 |
  data$treatment_h == 1] <- 0
data$treatment_source[data$treatment_a == 1 |
  data$treatment_b == 1 |
  data$treatment_c == 1 |
  data$treatment_d == 1] <- 1

# Channel treatment ####
# 0 is Facebook, 1 is website

data$treatment_channel <- NA
data$treatment_channel[data$treatment_c == 1 |
  data$treatment_d == 1 |
  data$treatment_g == 1 |
  data$treatment_h == 1] <- 0
data$treatment_channel[data$treatment_a == 1 | data$treatment_b == 1 |
  data$treatment_e == 1 | data$treatment_f == 1] <- 1

# Content treatment ####
# 0 is anti-migration content, 1 is pro-migration content

data$treatment_content <- NA
data$treatment_content[data$treatment_b == 1 | data$treatment_d == 1 |
  data$treatment_f == 1 | data$treatment_h == 1] <- 0
data$treatment_content[data$treatment_a == 1 | data$treatment_c == 1 |
  data$treatment_e == 1 | data$treatment_g == 1] <- 1

data$treatment_combination <- paste(data$treatment_source, data$treatment_content, sep = "_")
data$treatment_combination <- as.factor(data$treatment_combination)
data <- data %>% mutate(
  treatment_combination = ifelse(treatment_combination == "NA_NA", 
                                        NA, treatment_combination))

# Congruence treatments ####
## 0 if incongruent, 1 if congruent

## treatment_congruence50
data$treatment_congruence50 <- NA
data$treatment_congruence50[data$migration_attitude_dummy_50 == 1 & data$treatment_content == 1] <- 1
data$treatment_congruence50[data$migration_attitude_dummy_50 == 0 & data$treatment_content == 0] <- 1
data$treatment_congruence50[data$migration_attitude_dummy_50 == 1 & data$treatment_content == 0] <- 0
data$treatment_congruence50[data$migration_attitude_dummy_50 == 0 & data$treatment_content == 1] <- 0

## treatment_congruence_30
data$treatment_congruence_30 <- NA
data$treatment_congruence_30[data$migration_attitude_dummy_30 == 1 & data$treatment_content == 1] <- 1
data$treatment_congruence_30[data$migration_attitude_dummy_30 == 0 & data$treatment_content == 0] <- 1
data$treatment_congruence_30[data$migration_attitude_dummy_30 == 1 & data$treatment_content == 0] <- 0
data$treatment_congruence_30[data$migration_attitude_dummy_30 == 0 & data$treatment_content == 1] <- 0
```
  
```{r data-manipulation-checks, paged.print=FALSE}

# NA coded as 0 (covered by the ifelse statements): No answer is also counted as wrong answer for the manipulation check questions

# Content manipulation ####

## Report 2 (0 is anti-migration content)

data <- data %>% rename(report_2_check = QID736)

data <- data %>% mutate(
  report_2_check_correct =
    ifelse(treatment_content == 1,
      ifelse(report_2_check == "Zuwanderer sind in Deutschland unter Tatverdaechtigen nicht ueberrepraesentiert.", 1, 0),
      ifelse(report_2_check == "Zuwanderer sind in Deutschland unter Tatverdaechtigen ueberrepraesentiert.", 1, 0)
    )
)

## Report 3 (0 is anti-migration content)

data <- data %>% rename(report_3_check = QID353)

data <- data %>% mutate(
  report_3_check_correct =
    ifelse(treatment_content == 1,
      ifelse(report_3_check == "Die Mehrheit der Fluechtinge schafft den Deutschtest.", 1, 0),
      ifelse(report_3_check == "Die Mehrheit der Fluechtinge schafft den Deutschtest nicht.", 1, 0)
    )
)

## Report 4 (0 is anti-migration content)

data <- data %>% rename(report_4_check = QID506) # Antwort (3 Werte)

data <- data %>% mutate(
  report_4_check_correct =
    ifelse(treatment_content == 1,
      ifelse(report_4_check == "Private Seenotrettung im Mittelmeer erzeugt keine Sogwirkung.", 1, 0),
      ifelse(report_4_check == "Private Seenotrettung im Mittelmeer erzeugt eine Sogwirkung.", 1, 0)
    )
)

## Source manipulation ####

data <- data %>% rename(source_check = Q140)

data <- data %>% mutate(
  source_check_correct =
    ifelse(treatment_source == 1, # 1 is Tagesschau, 0 is N360
      ifelse(source_check == "Tagesschau", 1, 0),
      ifelse(source_check == "Nachrichten 360", 1, 0)
    )
)
```



```{r data-focus, cache = FALSE}

# Split raw data for each questionnaire page ####
## Get all question IDs that appear

page_ids <- unique(unlist(str_extract_all(data$FocusData, "QID[0-9][0-9][0-9]")))

# Extract one variable for each question ID ####

data$FocusData <- str_replace_all(data$FocusData, "QID", "--QID")
for (i in page_ids) {
  varname <- i
  data <- data %>%
    mutate(!!varname := NA) %>%
    mutate(!!varname := gsub(
      "QID[0-9]{3}: |QID[0-9]{3}:", "",
      gsub(
        "--Q.*", "",
        str_extract(
          FocusData,
          paste(varname, ":\\sD.*;  ", sep = "")
        )
      )
    ))
}



# Rename variables and unite variables per questionnaire page if necessary ####

data <- data %>% rename(consent_focus_raw = QID212)
data <- data %>% rename(gender_age_state_focus_raw = QID160)
data <- data %>% rename(know_focus_raw = QID189)
data <- data %>% rename(read_focus_raw = QID712)
data <- data %>% rename(trust_focus_raw = QID684)
data <- data %>% unite(
  "attitudes_focus_raw",
  c("QID186", "QID218", "QID184", "QID183", "QID185")
)
data <- data %>% mutate(attitudes_focus_raw = gsub("NA_|_NA", "", attitudes_focus_raw))
data <- data %>% rename(services_focus_raw = QID634)
data <- data %>% unite(
  "sharing_focus_raw",
  c("QID744", "QID747", "QID745", "QID746")
)
data <- data %>% mutate(sharing_focus_raw = gsub("NA_|_NA", "", sharing_focus_raw))
data <- data %>% rename(knowledge_focus_raw = QID808)
data <- data %>% rename(estimate_correct_focus_raw = QID734)
data <- data %>% rename(introduction_focus_raw = QID299)
data <- data %>% unite(
  "report_1_focus_raw",
  c("QID638", "QID643")
)
data <- data %>% mutate(report_1_focus_raw = gsub("NA_|_NA", "", report_1_focus_raw))
data <- data %>% unite(
  "report_2_focus_raw",
  c("QID659", "QID668")
)
data <- data %>% mutate(report_2_focus_raw = gsub("NA_|_NA", "", report_2_focus_raw))
data <- data %>% rename(report_2_check_focus_raw = QID736)
data <- data %>% unite(
  "report_3_focus_raw",
  c("QID664", "QID672")
)
data <- data %>% mutate(report_3_focus_raw = gsub("NA_|_NA", "", report_3_focus_raw))
data <- data %>% rename(report_3_check_focus_raw = QID353)
data <- data %>% unite(
  "report_4_focus_raw",
  c("QID666", "QID674")
)
data <- data %>% mutate(report_4_focus_raw = gsub("NA_|_NA", "", report_4_focus_raw))
data <- data %>% rename(report_4_check_focus_raw = QID506)
data <- data %>% unite(
  "report_5_focus_raw",
  c("QID647", "QID676")
)
data <- data %>% mutate(report_5_focus_raw = gsub("NA_|_NA", "", report_5_focus_raw))
data <- data %>% unite(
  "prompt_focus_raw",
  c("QID741", "QID782", "QID786", "QID779", "QID788")
)
data <- data %>% mutate(prompt_focus_raw = gsub("NA_|_NA", "", prompt_focus_raw))
data <- data %>% rename(source_check_focus_raw = QID809)
data <- data %>% rename(education_turnout_focus_raw = QID220)
data <- data %>% unite(
  "vote_focus_raw",
  c("QID219", "QID754", "QID756")
)
data <- data %>% mutate(vote_focus_raw = gsub("NA_|_NA", "", vote_focus_raw))
data <- data %>% rename(income_html_focus_raw = QID208)
data <- data %>% rename(debrief_focus_raw = QID831)

pages_focus <- c("consent_focus_raw", "gender_age_state_focus_raw", "know_focus_raw", "read_focus_raw", "trust_focus_raw", "attitudes_focus_raw", "services_focus_raw", "sharing_focus_raw", "knowledge_focus_raw", "estimate_correct_focus_raw", "introduction_focus_raw", "report_1_focus_raw", "report_2_focus_raw", "report_2_check_focus_raw", "report_3_focus_raw", "report_3_check_focus_raw", "report_4_focus_raw", "report_4_check_focus_raw", "report_5_focus_raw", "prompt_focus_raw", "source_check_focus_raw", "education_turnout_focus_raw", "vote_focus_raw", "income_html_focus_raw", "debrief_focus_raw")
```

```{r data-screening, paged.print=FALSE}

# Exclude invalid observations ####

data <- data %>% filter(completion == "Complete")
```


\clearpage
\pagenumbering{arabic} 
\renewcommand{\footnotelayout}{\setstretch{1}}
\addtocontents{toc}{\protect\setcounter{tocdepth}{0}}

# Introduction {#sec:introduction}

How citizens manage to stay informed about politics---or not---has been a long-standing concern of political science [@Lippmann1922; @Berelsonetal1954; @CarpiniKeeter1996; @Druckman2014]. In a complex world, people must turn to others for such information. Today, the Internet's "many-to-many" structure [@Tuckeretal2017; @vanAelstetal2017] makes it harder to know what sources to turn to: trust in professional media has been waning in many Western democracies [@Gallup2019; @PWC2018] and is increasingly related to partisanship [@Arceneauxetal2012; @TsfatiAriely2014]. At the same time, the spread of misinformation is rampant [@Guess2018-gw; @Vosoughi2018-nh]. How do news sources affect belief and sharing of their information in this environment?

Research on sources reaches back decades [e.g., @HovlandWeiss1951], but given the rapidly changing news ecology, it deserves renewed attention. In an original, preregistered survey experiment, we examine how sources influence whether people believe and intend to share news reports in an online context. Previous studies on the effect of sources have mostly compared partisan outlets or politicians [@Druckman2001c; @BaumGussin2008; @Swireetal2017]. In this study, we focus on a different aspect, namely the appearance of fraudulent sources. On the Internet, it takes no big effort to invent a news brand. Misinformation entrepreneurs sometimes counterfeit existing sources, as in a recent instance of misinformation on the Corona virus [@Faz2020-xk]. More subtly, they also use professional-sounding names to feign credibility. We mimic such a situation in our experiment by making up a source with a professional-sounding name and appearance. We compare this "fake" source with a real well-known source, and test in a first step whether this contrast affects people’s belief in a news report as well as their intent to share it.

Our key contribution is to highlight the ideological determinants of source credibility by manipulating the content of news reports in the ensuing experimental stages. Specifically, we randomly expose subjects to several reports that are either congruent or incongruent with their premeasured attitudes. While it is well known that people have a tendency to believe information that supports their worldview [@Lordetal1979; @TaberLodge2006; @JeritBarabas2012], we show that a source that provides the "right" facts is *subsequently* more likely to be believed. This can explain, first, why trust in and exposure to media is partisan [@GoldmanMutz2011; @IyengarHahn2009; @PennycookRand2019c]. Second, we note that this dynamic might be exploited by malevolent agents to generate trust in fake sources.

We further contribute to the literature by thoroughly examining treatment heterogeneity with machine learning methods recently developed by @Athey2016-ow [also @Wager2018-tk]. In contrast to conventional approaches, the causal forest approach allows to explore heterogeneity over a large covariate space without concerns about multiple testing. We find that treatment effects differ strongest across mainstream media trust, vote choice, age, political knowledge, and income. Our findings highlight how different groups vary in their vulnerability to fake sources and one-sided reporting.

We proceed as follows: Section \@ref(sec:sources) discusses previous literature on source credibility and develops expectations for the contrast between an existing and a fake source. In Section \@ref(sec:congruence), we elaborate on our prediction that congruent news reports will influence belief and sharing intentions of subsequent information. Section \@ref(sec:methods) presents the design, data, and measures. Section \@ref(sec:results) summarizes the results. Section \@ref(sec:conclusion) provides a conclusion and suggestions for future research.

# Theory and hypotheses {#sec:theory}

## Fake vs. real sources  {#sec:sources}

The importance of sources in information transmission has been a central concern of psychology at least since the work of Hovland and colleagues [@HovlandWeiss1951; @Hovlandetal1953]. Political science joined the debate about "source cues" some decades later [@Zaller1992; @Snidermanetal1991; @Popkin1991; @Pageetal1987; @Mondak1993a]. Both disciplines were mainly interested how sources affect opinion change (*persuasion*). Fewer studies tested how sources impacted *believing* factual information or intentions to *share* it. Only recently, with rising concern about fake news, scholars have paid more attention to the impact of sources on these outcomes [e.g., @PennycookRand2017b; @Swireetal2017; @OeldorfDeVoss2019]. 

Most of the literature links the effect of sources to the concept of *source credibility*: the more credible a source, the more people tend to be persuaded by its arguments or believe the information it provides. We follow this idea and assume that people hold subjective, not necessarily conscious, impressions about the credibility of a source, influencing outcomes such as belief and sharing intentions.^[There is some confusion about how the concepts of "media trust" and "source credibility" relate [@KohringMatthes2007]. In our terminology, both "credibility" and "trustworthiness" describe a news receiver's perception of the source. When someone perceives a source as credible or trustworthy, she "trusts" that source. The concept of "media trust" describes average credibility/trustworthiness perceptions across a set of sources; "mainstream media trust", the average perception across sources considered mainstream.] A host of experimental studies have tested which source characteristics determine these perceptions. One common example of a manipulation is the *professionalism* or expertise of a source. The seminal study by @HovlandWeiss1951 compared the persuasiveness of a scientific journal to that of a pictorial monthly, a contrast adapted by many subsequent studies [e.g., @Sternthaletal1978; @Pettyetal1981; @Chebatetal1988; @Greer2003]. Another manipulation concerned the source's physical *attractiveness* or likability [e.g @Chaiken1980; @MillsAronson1965]. *Similarity*, e.g., in social or political identity, between source and receiver was another effective treatment [e.g., @KuklinskiHurley1994; @BaumGroeling2009; @Kuruetal2017]. 

In the present study we are interested in a contrast that has become particularly important with the rise of the Internet and social media. First, let us note that we study the situation of people receiving news reports, by which we mean factual information that can be either true or false. People typically do not know ex ante whether a news report is true and usually do not have the time to verify the facts: It is in these situations that sources seem most relevant. Online, the variety of sources is practically unlimited. News reports may originate from real, well-known news organizations. Some of these might be widely perceived as ideologically biased or generally unreliable, but we focus here on real sources that generally enjoy high levels of trust. On the other hand, the Internet facilitates the emergence of fraudulent sources that do not actually represent a news organization. Anyone with some technical understanding can easily set up a Facebook or web page to spread news, and it is easily possible to imitate real news organizations in appearance and name. 

We are interested in this contrast between real sources generally perceived as credible (henceforth *real* sources), and fake sources imitating them in name and appearance (henceforth *fake* sources). To illustrate the relevance of this contrast, consider the story of the made-up *Denver Guardian*, which posted a story on Hillary Clinton that garnered more than half a million interactions on Facebook [@Berghel2017-nu]. The source name played on existing, respected brands like the *Guardian*, helped by a neutral website appearance. As documentations of the most successful fake news stories before the 2016 US election illustrate, such sources---next to more obviously hyperpartisan outlets---are a prominent factor in the dissemination of misinformation [e.g., @Silverman2016]. 

The question, thus, is whether fake sources are believed substantially less than real sources. Recent studies have examined this contrast only implicitly, by exposing subjects to true and false news with the original source attributions, but have not disentangled source and content effects [@PennycookRand2017b; @PennycookRand2019b; @PennycookRand2019a]. Real and fake sources as defined above may differ on several dimensions. It might be that a fake source’s name and appearance fail to signal that it reports true information. An alternative mechanism may be familiarity: people should be less inclined to believe sources they do not know. Indeed, in some recent studies, familiarity with a source has been found to affect belief [@PennycookRand2019c; @Epsteinetal2019]. Given the same news report provided by either a real or a fake source, we thus predict that *people have a higher tendency to believe news reports from a real source than from a fake source (H1a)*. 

Internet users are not just passive receivers of news but also share them. We adapt a broad definition of sharing, which includes any dissemination of news reports via social media, but also messengers and email. Researchers are only beginning to understand what makes news "shareworthy" [@BoczkowskiMitchelstein2012; @Trillingetal2017; @Barbera2015-ej]. It is unclear whether sources affect sharing in a similar way as they affect belief. There could be a path through belief: people share information they believe to be true. Yet, there is evidence that believing something is not a prerequisite for sharing it. People might be motivated to distribute information they know is false for political gain [@ChadwickVaccari2019] or because of a "need for chaos" [@Petersenetal2018]. Alternatively, reputational concerns might play a role: Insofar as social media is about self-presentational needs [@Seidman2013], users might be more willing to share sources with prestige, irrespective of whether they believe the content. Without elaborating on these mechanisms, recent studies have found sources to matter for sharing intentions [@PennycookRand2017b]. In the absence of more direct evidence, we predict that people *will be more likely to share news reports that come from real sources than those that come from fake sources (H1b)*. Taken together, H1a and H1b predict what we call the source effect.

## Congruence & repeated interactions {#sec:congruence}

We start from the assumption that someone's perception of a source's credibility develops over repeated interactions. For example, if credibility partly depends on whether the source objectively tells the truth, then a source should be perceived as more credible when it provided true information and less so when the information turns out to be false. However, patterns of media trust suggest that it cannot be objective truth-telling alone that makes a source subsequently credible in the eyes of receivers: various studies show that people consider media organizations that align with them ideologically more trustworthy [@MediaInsight2017b; @TsfatiAriely2014]. Such purely correlational observations do not tell us about the direction of causality: we do not know whether people align ideologically with sources they have always found credible or whether they find sources more credible after having been supplied with ideologically congruent content.

We here focus on the impact of congruent reporting on later credibility, and thus belief and sharing. Our argument starts from the well-established finding in psychology and political science that people evaluate information according to its *congruence* with their pre-existing attitudes, values, or beliefs [@Lordetal1979; @TaberLodge2006; @MacCounPaletz2009; @Kahanetal2017b]. In particular, they are more likely to accept attitudinally congruent than incongruent *factual* information, irrespective of the actual truth value [@WashburnSkitka2017; @PennycookRand2019b].^[There is a debate whether the effect of congruence can be explained within a Bayesian framework or whether it should be conceived of as "bias", which is not crucial for our expectations below [@GerberGreen1999; @Kahan2016b; @Tappinetal2018].]

We suggest that such one-sided processing of factual information will influence later credibility perceptions of a source. Since people tend to believe (to not believe) congruent (incongruent) information because of their directional motivations, they have a *subjective* perception of the source as truth-telling (or not). This credibility perception will at least partly be independent of whether the source is *objectively* truth-telling---assuming people do mostly not actively verify reporting. After repeated encounters with a source that provides congruent information, an individual will hence perceive the source as more credible. In consequence, the individual should be more likely to believe factual information (regardless of what it is) by that source later on. Studies conceiving of the credibility of a source as a probability that is updated in a Bayesian fashion come to similar predictions [@Koehler1993; @GentzkowShapiro2006]. To our knowledge, no experimental study has hitherto tested this prediction in the context of political news. In sum, we expect that *those who previously saw congruent (vs. incongruent) news reports by a source will have a higher tendency to believe a subsequent news report by the same source (H2a).* 

As far as sharing intention is concerned, we predict a similar effect of congruence. Again, we do not claim that this effect necessarily goes through belief. It is possible that people learn which sources provide congruent content and can be shared without risk for one’s reputation among ideologically close people. Motivation to share the "right" sources and fear of sharing "wrong" sources might be independent of truth judgement. Whichever the mechanism, we expect that *those who previously saw congruent (vs. incongruent) news reports by a source will be more likely to share a subsequent news report by the same source (H2b)*. We refer to these predictions as the congruence effect.

We further argue that this congruence effect will play out differently for real and fake sources. As argued above, credibility perceptions will depend on previous interactions. For a real source that people are likely to be familiar with, credibility perceptions have formed over a longer period. Such perceptions are likely to be sticky. Relatively few additional encounters should not change much for real sources. In contrast, for a fake source previously unknown to people, a few encounters that involve congruent information may change an individual's perception significantly. This suggests a normatively worrying dynamic: fraudulent actors may quickly gain an individual's trust by providing the "right" facts. In sum, we expect that the *difference in later news belief between those who saw congruent and those who saw incongruent reports is higher for the fake source than for the real source (H3a)*. We also expect the *difference in later sharing intentions between those who saw congruent and those who saw incongruent reports to be higher for the fake source than for the real source (H3b)*.

## Heterogeneity of source and congruence effects {#sec:heterogeneity}

We are further interested in exploring treatment heterogeneity. As a review of five decades of credibility research notes, the "interaction between source credibility and demographics of recipients has not been researched or analyzed to a great extent" [@Pornpitakpan2004, p. 263]. We thus do not have clear expectations how the source effect (H1a, H1b) and the congruence effect (H2a, H2b) might vary across subpopulations and chose an exploratory approach without prespecifying any hypotheses. However, we want to point to some evidence that invites speculation about moderators. 

First, *age* figures as an important factor impacting media behavior, e.g., online participation [@Hargittai2008-fa; @Loges2001-bh]. Pertinent to our hypothesis on source effects, there is mixed evidence how people of different ages evaluate different types of media outlets: some surveys suggest that younger people are less impressed of mainstream news sources [@MediaInsight2017b; @Yougov2017; @PWC2018], while others find older people to be greater "media skeptics" [@Gunther1992; @Metzgeretal2003]. 

The same studies also suggest an important role of *education* and *income*, variables that have also been found to correlate with the ability to distinguish professional and unprofessional websites [@Foggetal2001]. There is also a debate whether higher education and related variables such as *political knowledge* lead to more or less bias in information processing [@Kahan2013b; @Tappinetal2018]. Accordingly, the effect of congruence on believing a source subsequently could vary along these characteristics.

A large number of surveys and studies have examined people's media trust, understood as an individual perception about how certain types of media, e.g., "the mass media" or "television", can be trusted, report fair and accurately, and tell the whole story [@FlanaginMetzger2000; @Kiousis2001; @KohringMatthes2007; @Tsfati2010; @Gallup2019]. In particular, *mainstream media trust* has been found to correlate with behaviors of information processing. With regard to our study, it could be that people with high media trust are more affected by the source when judging news and deciding whether to share them. Relatedly, there is evidence that certain *media habits* such as frequency of social media use affect news processing [@Greer2003; @AllcottGentzkow2017].

Finally, *ideology* or *vote choice* have time and again proved to be related to media behavior and information processing [e.g., @Valloneetal1985; @IyengarHahn2009]. In the US, Republicans select and tend to believe different news sources than Democrats [@PennycookRand2019c; @Gallup2019], although the relation between ideology and media evaluations is less clear in Germany [@Yougov2017]. Scholars also debate the role of partisanship in bias in information seeking and processing [@Kahan2016b; @Jost2017; @Dittoetal2018a]. It could be that one side of the political spectrum is more prone to believing sources that cater to their pre-existing beliefs and attitudes.

# Methods: design, data, measures and models {#sec:methods}

## Sample

To test our predictions, we ran a preregistered online survey experiment between March 14 and 29, 2019. We didn't collect any data before the preregistration on March 12th, 2019 [see Appendix \@ref(sec:preregistration) and https://osf.io/q2ucj]. The non-probability sample of Germans was recruited by the survey company Respondi AG. To enhance external validity, we screened respondents into the survey according to quotas on gender, age (5 categories), and state of residence. We excluded respondents who did not complete the questionnaire (125 respondents). Those who completed the questionnaire in less than half of the median time (308 respondents) were marked as non-completes by the survey company. Robustness tests show that their exclusion does not change results. The final sample included `r format(as.numeric(nrow(data)), nsmall=0, big.mark=",")` participants. See Appendix \@ref(sec:representativeness) for a comparison of our sample to the German population.

## Experimental design

The questionnaire and randomization logic was built with Qualtrics. The setup is depicted in Figure \@ref(fig-1). Upon reading about the study's content and giving their consent, subjects indicated their age, gender, and state of residence and were screened in/out accordingly. Subsequently, we measured a range of covariates (see Appendix Table \@ref(tab:tab-A1) for the question wording): to measure familiarity with news sources, we exposed subjects to a list of eleven real or made-up outlet names in random order and asked whether they recognized the source and, if so, whether they had already read or watched news by that source. From the former question, we constructed a "mainstream source knowledge" index averaging scores for seven sources. This was followed by a standard trust question asked for the same list of sources. Again, we constructed a "mainstream media trust" index averaging scores for seven mainstream sources. Subsequently, we measured attitudes on immigration with a five-item battery. The items showed high reliability (Cronbach's alpha = `r round(cronbachs_alpha$total[1],2)`), and we computed an individual average across the five items. 

We further asked respondents whether they had an account for email, Facebook, Twitter and Whatsapp (see Figure \@ref(fig-A7)) and how frequently they shared news via each platform. Sharing daily or several times a week was most common via Whatsapp (`r round((prop.table(table(data$sharing_whatsapp_fac))[4] + prop.table(table(data$sharing_whatsapp_fac))[5]), 3)*100 `%), followed by Twitter (`r round((prop.table(table(data$sharing_twitter_fac))[4] + prop.table(table(data$sharing_twitter_fac))[5]), 3)*100 `%), Facebook (`r round((prop.table(table(data$sharing_fb_fac))[4] + prop.table(table(data$sharing_fb_fac))[5]), 3)*100`%) and email (`r round((prop.table(table(data$sharing_email_fac))[4] + prop.table(table(data$sharing_email_fac))[5]), 3)*100`%). This question was followed by a battery measuring political knowledge and a question eliciting overconfidence in that knowledge.

\begin{figure}[H]
\centering
\caption{Experimental setup}\label{fig-1}
		\includegraphics[width=0.8\linewidth]{input/fig-1.pdf}
\begin{flushleft}
{ 
\scriptsize

\textbf{Note:} Participants read news reports 1--5; Report 1 and 5 are true and the same for all participants; Reports 2--4 are manipulated, either showing anti-immigration or pro-immigration content. Combined with respondents previous attitudes we construct a randomized treatment referring to either congruent or incongruent ideological content. The experiment contained a third randomized dimension which is not depicted here (see below and App. A.10.5)

\normalsize
}
\end{flushleft}
\end{figure}

As depicted in Figure \@ref(fig-1), the ensuing experimental stage required subjects to read five news reports, Reports 1--5. The central task was to indicate their belief and sharing intention for the two true reports, i.e., Reports 1 and 5. All reports, whether true or manipulated, provided information about a specific factual question bearing on immigration and immigration policy, a topic likely to be salient to respondents.^[One possible concern is that we conducted our experiment during times in which those issues were particularly controversial. Google trends data seems to suggest otherwise (see Appendix \@ref(sec:saliency), Figure \@ref(fig-A19) and \@ref(fig-A20)). Regarding the issue of immigration, while their concrete policy positions vary, Germany’s parties can be broadly categorized in immigration-critical parties (AfD, CDU/CSU) and parties that are more supportive of immigration (SPD, Greens, Left Party).] Report 1 (true) presented current numbers of family reunions among refugees in Germany. Report 2 (manipulated) concerned representation of immigrants in crime statistics. Report 3 (manipulated) provided data on the performance of refugees in German language tests. Report 4 (manipulated) presented data on a possible pull effect of private sea rescue in the Mediterranean and Report 5 (true) concerned the use of emigration subsidies. To make the contrasts between sources meaningful, we had to assume that few subjects had read the true Reports 1 and 5 we measured our outcomes on: as discussed previously, sources should matter when news receivers do not already know the truth. We therefore picked reports that had a low number of shares on social media, and for which we found little related reporting.^[Note that we opted to measure the outcomes on (slightly modified) true rather than fake news reports. It would also be interesting to understand how our treatments affect believing and sharing of false content. However, the plausibility of the chosen fake news report is hard to calibrate. Had we inadvertently picked a very implausible piece of misinformation, people would not believe it irrespective of the source. This is less problematic for a true news report, because it represents an aspect of the real world.]

To boost experimental realism, we presented reports exactly as they would look like online, i.e., as posts on the social media platform Facebook or articles on news websites, by providing a screenshot of the (allegedly) original report before the text. Figure \@ref(fig-2) shows an example of such a screenshot and Appendix \@ref(sec:screenshots) includes more examples. 

\begin{figure}[H]
\centering
\caption{Example of Facebook post screenshot accompanying reports}\label{fig-2}
		\includegraphics[width=0.6\linewidth]{input/fig-2.png}
\floatfoot{\textbf{Note:} The teaser translates as "The figures from the latest police statistics show: Immigrants are not suspected of a crime in Germany above average", the headline as "Immigrants not overrepresented among criminal suspects" and the beginning of the text as "Last year, 1.9 percent of all crime suspects...".}
\end{figure}
\vspace{-0.5cm}

We manipulated the reports along three dimensions. In the following, we refer exclusively to the two treatment dimensions relevant to our predictions. We discuss the third dimension, i.e., whether the news report was taken from Facebook or a news website, elsewhere. Corresponding to the two treatment dimensions discussed, participants were hence randomly assigned to four groups. Appendix \@ref(sec:balance) provides balance statistics and shows that there are no significant differences across treatment conditions. Most importantly, immigration attitudes and voting intentions do not differ significantly across treatments.

**Source Treatment**: Across all five news reports, we varied the news source, randomly assigning participants to receive all five reports from either of two sources. We operationalized the *real* source using the name and appearance of the *Tagesschau*, the news section of the largest German public broadcaster, likely to be known by most people. For the fake source, we made up an outlet called *Nachrichten 360*. We chose a name that sounded neutral and without any partisan tendency, and made the logo and appearance look typical of real news organizations, thereby reflecting the logic of fake sources feigning credibility. As Figure \@ref(fig-A2) illustrates, the real source is indeed known by most people. In contrast, few people imagine to know the fake source.

**Content & Congruence Treatment**: We further varied the content of the three middle reports, Report 2--4. Report 1 was a true (but slightly shortened) report and identical for all subjects. Thereafter, respondents were randomly assigned to read either a constructed "pro-immigration"  or "anti-immigration" version of each of the Reports 2--4. To construct such two versions of a report, we departed from some piece of real data and manipulated it in two directions, so that one version would be convenient to subjects with an immigration-friendly outlook, and the other version to those with an immigration-skeptic outlook. For example, Report 2 provided the latest statistics on crime rates among immigrants as compared to natives. The *pro-immigration* treatment group read that rates were lower than in reality; the *anti-immigration* treatment group read that they were higher. Report 5, was again true (but slightly shortened) and the same for all respondents. 

We combined the content randomization of Reports 2--4 with participants' immigration attitudes to construct a *congruence treatment*. The treatment is congruent if an immigration-skeptic respondent reads an *anti-immigration report* or if a supporter of immigration reads a *pro-immigration report*. The incongruent treatment consisted of the analogous opposite cases. We defined those below the mean of the composite index as immigration skeptics, and those above the mean as immigration supporters. This choice is robust against an alternative implementation, defining immigration skeptics and supporters in terms of tertiles, as shown in Appendix \@ref(sec:robust-tertile). 

After Report 1 and 5, respectively, subjects were asked to indicate whether they believed the report was true (*On a scale from 0 to 6, do you think that the information in the text of [source] is true? 0 means not at all, 6 means completely*), and whether they would share the report with the question: *Would you share the message from [news source] that you just read via...*, followed by a small battery that included the options *Email: Yes/No, Facebook: Yes/No Twitter: Yes/No Whatsapp: Yes/No*, depending on which services respondents had indicated using. Although this question measures hypothetical sharing decisions, we are confident that it is an approximation of real sharing behavior, as shown elsewhere [@Moslehetal2019]. This measure of sharing intention has provided valuable insights in several recent studies [@Pennycooketal2020a; @Pennycooketal2019b].

Finally, we inquired about subjects' education, their turnout and vote choice at the previous election, income, and basic programming knowledge. Before participants could complete the survey, we debriefed and informed them about the purpose of the experiment and the corresponding manipulations. Our experimental design was approved by the Ethics Committee of the European University Institute under file number #CG8-1-2019. We employed several strategies to mitigate the impact of deception in our study: First, we clarified which of the sources and contents were constructed, and provided subjects with the true facts for the manipulated reports. Second, we emailed more substantive information related to the news reports to those interested a few weeks after the study. Third, we provided them with an open-ended feedback box after the debriefing, which revealed that the the survey experience was overwhelmingly positive. Please see Appendix \@ref(sec:openfeedback) for further discussion of ethical considerations. 

## Analysis

To test our preregistered hypotheses, we rely on t-tests for the main effects and a linear regression for the interaction effect. In addition, we conducted a whole series of robustness checks (cf. Section \@ref(sec:robustnesschecks)) and manipulation checks to make sure participants received the treatment (cf. Section \@ref(sec:robust-manipcheck)), and we tracked their attention while answering the questionnaire with a JavaScript module developed by @DiedenhofenMusch2017. 

Regarding treatment heterogeneity discussed in Section \@ref(sec:heterogeneity), we did not prespecify any predictions. Because of the large number of covariates, testing for heterogeneity manually by either iteratively subsetting the data or including a large number of interaction terms would be cumbersome. More importantly, p-values would no longer be valid due to multiple testing issues [@Athey2017-sy]. We therefore follow an exploratory approach and rely on novel machine learning methods developed in @Athey2016-ow, @Wager2018-tk, and @Athey2019-fy, which allows for testing heterogeneity across any number of covariates without running into validity issues. The "causal forest" approach is based on the fundamental concept of random forests as well as the generalized random forest framework proposed by @Athey2019-fy. See Appendix \@ref(sec:thoverview) for details. 


# Empirical results {#sec:results}

```{r models-for-fig-3, message=FALSE, warning=FALSE}
# Hypotheses H1a, H1b ####

# T-tests ####

h1_ttest_results <- data %>%
  select(
    treatment_source,
    belief_report_1_num,
    share_report_1_email_num,
    share_report_1_fb_num,
    share_report_1_twitter_num,
    share_report_1_whatsapp_num
  ) %>%
  gather(variable, value, belief_report_1_num:share_report_1_whatsapp_num) %>%
  na.omit(value) %>%
  group_by(variable) %>%
  do(tidy(t.test(value ~ treatment_source,
    data = ., na.action = "na.omit",
    var.equal = TRUE
  ))) %>%
  ungroup() %>%
  mutate(estimate = estimate2 - estimate1) %>%
  mutate(estimate = round(estimate, 2)) %>%
  mutate(p.value = format(round(p.value, 2), nsmall = 2)) %>%
  mutate(variable = case_when(
    variable == "belief_report_1_num" ~ "Belief",
    variable == "share_report_1_email_num" ~ "Email",
    variable == "share_report_1_fb_num" ~ "Facebook",
    variable == "share_report_1_twitter_num" ~ "Twitter",
    variable == "share_report_1_whatsapp_num" ~ "Whatsapp"
  )) %>%
  mutate(N = parameter + 2)

# Linear models ####

h1a_lm <- lm(belief_report_1_num ~ treatment_source, data = data)
h1b_email_lm <- lm(share_report_1_email_num ~ treatment_source, data = data)
h1b_fb_lm <- lm(share_report_1_fb_num ~ treatment_source, data = data)
h1b_twitter_lm <- lm(share_report_1_twitter_num ~ treatment_source, data = data)
h1b_whatsapp_lm <- lm(share_report_1_whatsapp_num ~ treatment_source, data = data)

# SEs taking heteroskedasticity into account

h1_results_robust <- rep(list(NULL), 5)
objects <- c(
  "h1a_lm",
  "h1b_email_lm",
  "h1b_fb_lm",
  "h1b_whatsapp_lm",
  "h1b_twitter_lm"
)
names(h1_results_robust) <- objects

for (i in objects) {
  assign("fit.i", get(i))
  cov_mat.i <- sandwich::vcovHC(fit.i, type = "HC1")
  h1_results_robust[[i]] <- lmtest::coeftest(fit.i, vcov = cov_mat.i)
}

source_effect_belief <- h1_ttest_results$estimate[h1_ttest_results$variable == "Belief"]
source_effect_email <- h1_ttest_results$estimate[h1_ttest_results$variable == "Email"]
source_effect_twitter <- h1_ttest_results$estimate[h1_ttest_results$variable == "Twitter"]
source_effect_facebook <- h1_ttest_results$estimate[h1_ttest_results$variable == "Facebook"]
source_effect_whatsapp <- h1_ttest_results$estimate[h1_ttest_results$variable == "Whatsapp"]
```

## Source treatment

Hypothesis H1a posits that *people have a higher tendency to believe news reports from a real source than from a fake source*. To test this, we compare belief in Report 1, which has the same content for all subjects, across source treatment groups. Plot A in Figure \@ref(fig-3) visualizes group means and results from t-tests. The real source indeed elicited higher average belief (also see Table \@ref(tab:tab-A11) for OLS estimates). In substantive terms, the difference of `r source_effect_belief` on a 7-point scale is comparable to source effects in studies with similar designs [e.g., @KnightGallup2018]. It is, however, much smaller than individual-level predictors of belief such as familiarity with a story in @Pennycooketal2018. 

We also expected that people *would be more likely to share news reports that come from real sources than those that come from fake sources (H1b)*. Plot B in Figure \@ref(fig-3) illustrates that intention to share is generally low. The plot also does not suggest large source effects on sharing. T-tests for email (mean difference: `r source_effect_email`) and Twitter (mean difference: `r source_effect_twitter`) do not yield statistically significant effects. In the case of Twitter, any real difference would be difficult to detect due to the small number of Twitter users in our sample. In contrast, there are significant differences for Facebook (mean difference: `r source_effect_facebook`) and Whatsapp (mean difference: `r source_effect_whatsapp`). 

Although we did not assume that the treatment effect on sharing must necessarily work through belief, we ran some additional analyses to explore this possibility. Figure A6 in the Appendix shows that belief is positively correlated to all sharing outcomes (about 0.2 for each correlation). If sharing could be fully explained by belief, we would expect a correlation of 1. The much lower correlation suggests that other factors are more important.

Effects on sharing seem small but could turn out substantial in the presence of cascade dynamics on social media [@DelVicarioetal2016]. The plots show a low level of sharing tendency in general. It is somewhat lower than in recent comparable studies in the US context, although these collapse several response categories [@Pennycooketal2018; @Pennycooketal2020a; @Pennycooketal2019b].^[Sharing intentions reported by the cited studies include both subject answering "yes" and subject answering "maybe".] However, once we look at results for "frequent sharers" only, we find that both baseline sharing and the source effect on Facebook sharing is larger (see Appendix \@ref(sec:robust-frequentsharers)). We further looked at sharing intentions by partisanship. In contrast to observational studies in the US [e.g., @Benkleretal2018], we do not find any clear patterns in partisan sharing behavior (see Appendix \@ref(sec:robust-partisansharing)).

```{r fig-3, echo=FALSE, fig.cap="Source effect on news belief and sharing\\label{fig-3}", fig.height=4, fig.width=10, message=FALSE, warning=FALSE, out.extra=''}

# Annotations for graphs

annotations <- paste(h1_ttest_results$variable,
  ": Coef = ", h1_ttest_results$estimate,
  "; p = ", h1_ttest_results$p.value,
  ", N = ", h1_ttest_results$N,
  sep = ""
)
annotations[1] <- gsub("Belief: ", "", annotations[1])

# Summary of treatment means and SEs (belief)

results_df_belief2 <- data %>%
  rename(treatment_plot = treatment_source) %>%
  group_by(treatment_plot) %>%
  dplyr::select(belief_report_1_num) %>%
  group_by(N = n(), add = TRUE) %>%
  summarize_all(funs(mean, var, sd, na_sum = sum(is.na(.))), na.rm = TRUE) %>%
  mutate(n = N - na_sum) %>%
  mutate(
    se = sqrt(var / n),
    ci.error = qt(0.975, df = n - 1) * sd / sqrt(n),
    conf.low = mean - ci.error,
    conf.high = mean + ci.error
  ) %>%
  mutate(Group = ifelse(treatment_plot == 1, "Tagesschau", "Nachrichten 360")) %>%
  mutate(outcome = "trust") %>%
  mutate(outcome = str_to_title(outcome))

# Summary of treatment means and SEs (sharing)

results_df_share2 <- data %>%
  rename(treatment_plot = treatment_source) %>%
  group_by(treatment_plot) %>%
  dplyr::select(
    share_report_1_email_num,
    share_report_1_fb_num,
    share_report_1_twitter_num,
    share_report_1_whatsapp_num
  ) %>%
  group_by(N = n(), add = TRUE) %>%
  gather(
    key = "variable", value = "value",
    share_report_1_email_num:share_report_1_whatsapp_num
  ) %>%
  group_by(treatment_plot, variable) %>%
  summarize(
    N = mean(N),
    mean = mean(value, na.rm = TRUE),
    var = var(value, na.rm = TRUE),
    sd = sd(value, na.rm = TRUE),
    na_sum = sum(is.na(value))
  ) %>%
  mutate(n = N - na_sum) %>%
  mutate(
    se = sqrt(var / n),
    ci.error = qt(0.975, df = n - 1) * sd / sqrt(n),
    conf.low = mean - ci.error,
    conf.high = mean + ci.error
  ) %>%
  mutate(Group = ifelse(treatment_plot == 1, "Tagesschau", "Nachrichten 360")) %>%
  mutate(variable = case_when(
    variable == "share_report_1_email_num" ~ "Email",
    variable == "share_report_1_fb_num" ~ "Facebook",
    variable == "share_report_1_twitter_num" ~ "Twitter",
    variable == "share_report_1_whatsapp_num" ~ "Whatsapp"
  ))

# Plot for treatment effects on belief

p1 <- ggplot(results_df_belief2, aes(
  x = outcome, y = mean,
  group = as.factor(treatment_plot),
  color = as.factor(treatment_plot)
)) +
  geom_point(position = position_dodge(0.4), size = 1.1) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
    width = .3,
    position = position_dodge(0.4), size = 0.7
  ) +
  labs(title = "(A) Outcome: Belief ", x = "", y = "Belief Report 1 (0-6)") +
  scale_color_manual(
    values = c("red", "blue"),
    labels = c("Nachrichten 360 (fake)", "Tagesschau (real)"),
    name = "Source treatment:"
  ) +
  theme_light() +
  theme(
    axis.text.x = element_blank(),
    axis.title.y = element_text(size = 12),
    axis.ticks = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    legend.title = element_text(size = 13),
    legend.text = element_text(size = 12),
    plot.title = element_text(size = 13, hjust = 0.5),
    plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm")
  ) +
  ylim(0, 6) +
  ggplot2::annotate(
    geom = "text", x = 0.5, y = 5.7,
    label = str_replace(annotations[1], "NA: ", ""),
    color = "#575757", size = 4, hjust = 0, vjust = 1
  )

# Annotations for sharing

annotations_sharing <- annotations[-1]
annotations_sharing <- paste(annotations_sharing, collapse = "\n")

# Plot of treatment effects on sharing

p2 <- ggplot(
  results_df_share2,
  aes(
    x = variable, y = mean,
    group = as.factor(treatment_plot),
    color = as.factor(treatment_plot)
  )
) +
  geom_point(position = position_dodge(0.5), size = 1.1) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
    width = .4,
    position = position_dodge(0.5), size = 0.7
  ) +
  labs(title = "(B) Outcome: Sharing intention", x = "", y = "Share Report 1 (0/1)") +
  scale_color_manual(
    values = c("red", "blue"),
    labels = c("Nachrichten 360 (fake)", "Tagesschau (real)"),
    name = "Source treatment:"
  ) +
  theme_light() +
  theme(
    axis.text.x = element_text(
      size = 12, vjust = 1,
      margin = margin(0.2, 0, 0, 0, "cm")
    ),
    axis.title.y = element_text(size = 13),
    axis.ticks = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    legend.title = element_text(size = 14),
    legend.text = element_text(size = 13),
    plot.title = element_text(size = 14, hjust = 0.5),
    plot.margin = margin(0.5, 0.5, 0, 0.5, "cm")
  ) +
  ylim(0, 1) +
  ggplot2::annotate(
    geom = "text", x = 1.5, y = 0.95, label = annotations_sharing,
    color = "#575757", size = 4, hjust = 0, vjust = 1
  )

# Combine and save plots

graph_combined <- ggarrange(p1, p2,
  common.legend = TRUE,
  legend = "bottom", ncol = 2, nrow = 1,
  font.label = list(size = 20),
  widths = c(1, 2)
)



annotate_figure(graph_combined,
    bottom = text_grob("Note: Plots show point estimates and 95% confidence intervals for average belief, and sharing intentions, per source treatment group.\nCoefficients and p-values from t-tests are listed in the top part of the plots.", color = "black",
                                  hjust = 0, x = 0, size = 11)
)
```







## Congruence effects

```{r models-for-fig-4, paged.print=FALSE}

# Hypotheses H2a, H2b ####

# T-tests ####
h2_ttest_results <- data %>%
  select(
    treatment_congruence50,
    belief_report_5_num,
    share_report_5_email_num,
    share_report_5_fb_num,
    share_report_5_twitter_num,
    share_report_5_whatsapp_num
  ) %>%
  gather(variable, value, belief_report_5_num:share_report_5_whatsapp_num) %>%
  na.omit(value) %>%
  group_by(variable) %>%
  do(tidy(t.test(value ~ treatment_congruence50,
    data = ., na.action = "na.omit",
    var.equal = TRUE
  ))) %>%
  ungroup() %>%
  mutate(estimate = estimate2 - estimate1) %>%
  mutate(estimate = round(estimate, 2)) %>%
  mutate(p.value = format(round(p.value, 2), nsmall = 2)) %>%
  mutate(variable = case_when(
    variable == "belief_report_5_num" ~ "Belief",
    variable == "share_report_5_email_num" ~ "Email",
    variable == "share_report_5_fb_num" ~ "Facebook",
    variable == "share_report_5_twitter_num" ~ "Twitter",
    variable == "share_report_5_whatsapp_num" ~ "Whatsapp"
  )) %>%
  mutate(N = parameter + 2)

# Linear models ####
h2a_lm <- lm(belief_report_5_num ~ treatment_congruence50, data = data)
h2b_email_lm <- lm(share_report_5_email_num ~ treatment_congruence50, data = data)
h2b_fb_lm <- lm(share_report_5_fb_num ~ treatment_congruence50, data = data)
h2b_twitter_lm <- lm(share_report_5_twitter_num ~ treatment_congruence50, data = data)
h2b_whatsapp_lm <- lm(share_report_5_whatsapp_num ~ treatment_congruence50, data = data)

# Calculate SEs that take heteroskedasticity into account

h2_results_robust <- rep(list(NULL), 5)
objects <- c(
  "h2a_lm",
  "h2b_email_lm",
  "h2b_fb_lm",
  "h2b_twitter_lm",
  "h2b_whatsapp_lm"
)
names(h2_results_robust) <- objects

# Loop
for (i in objects) {
  assign("fit.i", get(i))
  cov_mat.i <- sandwich::vcovHC(fit.i, type = "HC1")
  h2_results_robust[[i]] <- lmtest::coeftest(fit.i, vcov = cov_mat.i)
}

congruence50_effect_belief <- h2_ttest_results$estimate[h2_ttest_results == "Belief"]
congruence_effect_email <- h2_ttest_results$estimate[h2_ttest_results == "Email"]
congruence_effect_whatsapp <- h2_ttest_results$estimate[h2_ttest_results == "Whatsapp"]
congruence_effect_twitter <- h2_ttest_results$estimate[h2_ttest_results == "Twitter"]
congruence_effect_facebook <- h2_ttest_results$estimate[h2_ttest_results == "Facebook"]
```

We further expected that *those who previously saw congruent (vs. incongruent) news reports by a source would have a higher tendency to believe a subsequent news report by the same source (H2a)* and that they would be *more likely to share the news report (H2b)*. We test this prediction for Report 5, which had the same content across treatments, after subjects had read three reports, either congruent or incongruent according to their pre-treatment attitudes. 

Figure \@ref(fig-4) visualizes average belief across the two treatments and results from t-tests (see Table \@ref(tab:tab-A12) for OLS estimates). In support of H3a, subjects who saw three congruent reports indeed show a higher belief in the subsequent report (`r congruence50_effect_belief` on a scale from 0 to 6). The effect is smaller than in studies that test the effect of congruence of information on believing the *same* information. For example, @Kuruetal2017 find that a congruent report, compared to an incongruent report, is judged 0.14 units more true on average (on a scale from 0 to 1). However, in our comparison, the treatment difference is not due the content of Report 5, which is the same across treatments, but can only be explained by *previous* exposure to three news reports. 

```{r fig-4, echo=FALSE, fig.cap="Congruence treatment, belief and sharing\\label{fig-4}", fig.height=4, fig.width=10, message=FALSE, warning=FALSE, out.extra=''}

# Annotations

annotations <- paste(h2_ttest_results$variable,
  ": Coef. = ", h2_ttest_results$estimate,
  "; p = ", h2_ttest_results$p.value,
  ", N = ", h2_ttest_results$N,
  sep = ""
)

# Summary stats data frames

results_df_belief2 <- data %>%
  rename(treatment_plot = treatment_congruence50) %>%
  group_by(treatment_plot) %>%
  select(belief_report_5_num) %>%
  group_by(N = n(), add = TRUE) %>%
  summarize_all(funs(mean, var, sd, na_sum = sum(is.na(.))), na.rm = TRUE) %>%
  mutate(n = N - na_sum) %>%
  mutate(
    se = sqrt(var / n),
    ci.error = qt(0.975, df = n - 1) * sd / sqrt(n),
    conf.low = mean - ci.error,
    conf.high = mean + ci.error
  ) %>%
  mutate(Group = ifelse(treatment_plot == 1, "Congruent", "Incongruent")) %>%
  mutate(outcome = "belief") %>%
  mutate(outcome = str_to_title(outcome))

results_df_share2 <- data %>%
  rename(treatment_plot = treatment_congruence50) %>%
  group_by(treatment_plot) %>%
  select(
    share_report_5_email_num,
    share_report_5_fb_num,
    share_report_5_twitter_num,
    share_report_5_whatsapp_num
  ) %>%
  group_by(N = n(), add = TRUE) %>%
  gather(
    key = "variable", value = "value",
    share_report_5_email_num:share_report_5_whatsapp_num
  ) %>%
  group_by(treatment_plot, variable) %>%
  summarize(
    N = mean(N),
    mean = mean(value, na.rm = TRUE),
    var = var(value, na.rm = TRUE),
    sd = sd(value, na.rm = TRUE),
    na_sum = sum(is.na(value))
  ) %>%
  mutate(n = N - na_sum) %>%
  mutate(
    se = sqrt(var / n),
    ci.error = qt(0.975, df = n - 1) * sd / sqrt(n),
    conf.low = mean - ci.error,
    conf.high = mean + ci.error
  ) %>%
  mutate(Group = ifelse(treatment_plot == 1, "Congruent", "Incongruent")) %>%
  mutate(variable = case_when(
    variable == "share_report_5_email_num" ~ "Email",
    variable == "share_report_5_fb_num" ~ "Facebook",
    variable == "share_report_5_twitter_num" ~ "Twitter",
    variable == "share_report_5_whatsapp_num" ~ "Whatsapp"
  ))

# Plot of interaction effect on belief

p1 <- ggplot(
  results_df_belief2,
  aes(
    x = outcome, y = mean,
    group = as.factor(treatment_plot),
    linetype = as.factor(treatment_plot)
  )
) +
  geom_point(position = position_dodge(0.4), size = 1.1) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
    width = .3,
    position = position_dodge(0.4), size = 0.25
  ) +
  labs(title = "(A) Outcome: Belief ", x = "", y = "Belief Report 5 (0-6)") +
  scale_linetype_manual(
    values = c("dashed", "solid"),
    labels = c("Incongruent", "Congruent"),
    name = "Treatment:"
  ) +
  theme_light() +
  theme(
    axis.text.x = element_blank(),
    axis.title.y = element_text(size = 12),
    axis.ticks = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    legend.title = element_text(size = 13),
    legend.text = element_text(size = 12),
    plot.title = element_text(size = 13, hjust = 0.5),
    plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm")
  ) +
  ylim(0, 6) +
  ggplot2::annotate(
    geom = "text", x = 0.5, y = 5.7,
    label = str_replace(annotations[1], "Belief: ", ""),
    color = "#575757", size = 4, hjust = 0, vjust = 1
  )

# Annotations

annotations_sharing <- annotations[-1]
annotations_sharing <- paste(annotations_sharing, collapse = "\n")

p2 <- ggplot(
  results_df_share2,
  aes(
    x = variable, y = mean,
    group = as.factor(treatment_plot),
    linetype = as.factor(treatment_plot)
  )
) +
  geom_point(position = position_dodge(0.5), size = 1.1) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
    width = .35,
    position = position_dodge(0.5), size = 0.3
  ) +
  labs(title = "(B) Outcome: Sharing intention", x = "", y = "Share Report 5 (0/1)") +
  scale_linetype_manual(
    values = c("dashed", "solid"),
    labels = c("Incongruent", "Congruent"),
    name = "Treatment:"
  ) +
  theme_light() +
  theme(
    axis.text.x = element_text(
      size = 12, vjust = 1,
      margin = margin(0.2, 0, 0, 0, "cm")
    ),
    axis.title.y = element_text(size = 13),
    axis.ticks = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    legend.title = element_text(size = 14),
    legend.text = element_text(size = 13),
    plot.title = element_text(size = 14, hjust = 0.5),
    plot.margin = margin(0.5, 0.5, 0, 0.5, "cm")
  ) +
  ylim(0, 1) +
  ggplot2::annotate(
    geom = "text", x = 1.5, y = 0.95, label = annotations_sharing,
    color = "#575757", size = 4, hjust = 0, vjust = 1
  )

# Combine plots

graph_combined <- ggarrange(p1, p2,
  common.legend = TRUE,
  legend = "bottom", ncol = 2, nrow = 1, widths = c(1, 2)
)


annotate_figure(graph_combined,
    bottom = text_grob("Note: Plots show point estimates and 95% confidence intervals for average belief, and sharing intentions, per congruence treatment group.\nCoefficients and p-values from t-tests are listed in the top part of the plots.", color = "black",
                                  hjust = 0, x = 0, size = 11)
)

```



There is less evidence for effects of previous congruence on later sharing (H3b). T-tests for email (mean difference: `r congruence_effect_email`), Whatsapp (mean difference: `r congruence_effect_whatsapp`), and Twitter (mean difference: `r congruence_effect_twitter`) do not show statistically significant effects. Only for Facebook there seems to be a significant effect (mean difference: `r congruence_effect_facebook`): this implies that people are 5 percent more likely to express a sharing intention of the same report when received by a source that has previously provided congruent content than by a source that has previously provided incongruent content. Again, we point the reader to correlations between belief and sharing of Report 5 as reported by Figure A6 in the Appendix. Positive but weak correlations imply that belief is not a strong factor for sharing intentions.

```{r models-for-fig-5, include=FALSE}

# Hypotheses H3a, H3b ####

# Linear models

h3a_lm <- lm(belief_report_5_num ~
treatment_source +
  treatment_congruence50 +
  treatment_source * treatment_congruence50, data = data)
h3a_lm_sum <- summary(h3a_lm)$coef

h3b_email_lm <- lm(share_report_5_email_num ~
treatment_source +
  treatment_congruence50 +
  treatment_source * treatment_congruence50, data = data)
h3b_email_lm_sum <- summary(h3b_email_lm)$coef

h3b_fb_lm <- lm(share_report_5_fb_num ~
treatment_source +
  treatment_congruence50 +
  treatment_source * treatment_congruence50, data = data)
h3b_fb_lm_sum <- summary(h3b_fb_lm)$coef

h3b_twitter_lm <- lm(share_report_5_twitter_num ~
treatment_source +
  treatment_congruence50 +
  treatment_source * treatment_congruence50, data = data)
h3b_twitter_lm_sum <- summary(h3b_twitter_lm)$coef

h3b_whatsapp_lm <- lm(share_report_5_whatsapp_num ~
treatment_source +
  treatment_congruence50 +
  treatment_source * treatment_congruence50, data = data)
h3b_whatsapp_lm_sum <- summary(h3b_whatsapp_lm)$coef


# Calculate SEs that take heteroskedasticity into account

h3_results_robust <- rep(list(NULL), 5)
objects <- c(
  "h3a_lm",
  "h3b_email_lm",
  "h3b_fb_lm",
  "h3b_twitter_lm",
  "h3b_whatsapp_lm"
)
names(h3_results_robust) <- objects

# Loop

for (i in objects) {
  assign("fit.i", get(i))
  cov_mat.i <- sandwich::vcovHC(fit.i, type = "HC1")
  h3_results_robust[[i]] <- lmtest::coeftest(fit.i, vcov = cov_mat.i)
}
congruence_interaction_effect <- round(h3a_lm_sum[4, 1], 2)
fakesource_congruence_effect <- round(h3a_lm_sum[3, 1] + 0 * (h3a_lm_sum[4, 1]), 2)
realsource_congruence_effect <- round(h3a_lm_sum[3, 1] + h3a_lm_sum[4, 1], 2)
```

## Source--congruence interaction effects

We further hypothesized that *difference in later news belief between those who saw congruent and those who saw incongruent reports would be higher for the fake source than for the real source (H3a)* and expected the same for *sharing intentions (H3b)*. Figure \@ref(fig-5) visualizes the means in the four treatment groups as well as OLS estimates for an interaction effect (see also Table \@ref(tab:tab-A13)). For belief, the significant interaction term (b = `r congruence_interaction_effect`) supports our prediction. The positive regression coefficients for congruence and source and the negative interaction coefficient imply that, for the fake source (source treatment variable = 0), the effect of congruence on belief is `r fakesource_congruence_effect`. In contrast, the effect of congruence is negligible for the real source (`r realsource_congruence_effect`). In substantive terms, the effect of congruence for the fake source (`r fakesource_congruence_effect`) is more than half of the main source effect noted above (`r source_effect_belief`). In other words, the fake source can "make up" about half of the gap in belief after only three congruent stories. However, we do not know if and at what point this trend would lead to convergence. For our sharing outcomes, there is no clear pattern, with Facebook showing a small but significant interaction in the opposite direction. 

```{r initial-belief-difference}

# Here we test whether difference in believing report 1 between congruent and incongruent treatments are larger than what we would expect by chance

## T-tests for null hypothesis: No average difference between those assigned to congruent and incongruent treatment, both in Tagesschau and Nachrichten360 treatment

initial_diff_n360 <- t.test(belief_report_1_num ~ treatment_congruence50,
  data = data[data$treatment_source == 0, ]
)

initial_diff_ts <- t.test(belief_report_1_num ~ treatment_congruence50,
  data = data[data$treatment_source == 1, ]
)
```

Since we measured belief of both Report 1 and Report 5, we can interpret the results in terms of changing credibility as illustrated in Figure \@ref(fig-6). The difference in belief of Report 1 between the two sources (*Nachrichten 360* and *Tagesschau*) was due to higher credibility of the real source (*Tagesschau*). Subsequently, for the real source (*Tagesschau*) it does not matter much whether readers receive congruent or incongruent reports, as the non-difference for Report 5 shows that credibility perceptions did not change for the real source. In contrast, respondents had a higher tendency to believe Report 5 by the fake source (*Nachrichten 360)* if it had provided them with congruent news. The fake source thus gained in credibility.

Note that Figure \@ref(fig-6) gives the impression that belief increases *overall* between Reports 1 and 5. One reason could be that Report 1 and Report 5 are not equally plausible. Perhaps, Report 5 was just easier to square with people's pre-existing political knowledge than Report 1. It could also be that it was a "mere exposure" effect: subjects get used to the source and trust it more, whatever it does. In any case, the difference for the fake source can only be attributed to the randomly assigned exposure to previous reports.^[Note further that Figure \@ref(fig-6) also shows slight differences in belief between the congruence and incongruence treatment groups *before* the treatment is delivered. To assess the size of these initial differences, we ran t-tests within the source treatment groups comparing belief in Report 1 of subjects who received the congruent and incongruent treatments. For both the fake (p = `r round(initial_diff_n360$p.value, 3)`) and the real source (p = `r round(initial_diff_ts$p.value, 3)`), the initial differences at Report 1 are likely due to chance. We also reran the main analyses using the subsample of respondents who passed all our manipulation checks and found that those initial differences at Report 1 are much smaller once we take this subsample as visualized in Figure \@ref(fig-A11).]

```{r fig-5, echo=FALSE, fig.cap="Interactions between source and congruence treatment\\label{fig-5}", fig.height=6, fig.width=11, message=FALSE, warning=FALSE, out.extra=''}

# Belief #####

data_plot <- data %>%
  group_by(treatment_congruence50, treatment_source) %>%
  select(belief_report_5_num) %>%
  group_by(N = n(), add = TRUE) %>%
  summarize_all(funs(mean, var, sd, na_sum = sum(is.na(.))), na.rm = TRUE) %>%
  mutate(n = N - na_sum) %>%
  mutate(
    se = sqrt(var / n),
    ci.error = qt(0.975, df = n - 1) * sd / sqrt(n),
    conf.low = mean - ci.error,
    conf.high = mean + ci.error
  )

p1 <- ggplot(
  data_plot,
  aes(
    y = mean,
    x = treatment_source,
    group = as.factor(treatment_congruence50),
    color = as.factor(treatment_source),
    linetype = as.factor(treatment_congruence50)
  )
) +
  geom_point(position = position_dodge(0.6), size = 0.8) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
    width = .5,
    position = position_dodge(0.6), size = 0.6
  ) +
  labs(title = "(A) Belief ", x = "", y = "Belief Report 5 (0-6)") +
  scale_linetype_manual(
    values = c("dashed", "solid"),
    labels = c("Incongruent", "Congruent"),
    name = "Congruence treatment:"
  ) +
  scale_color_manual(
    values = c("red", "blue"),
    labels = c("Nachrichten 360 (F)", "Tagesschau (R)"),
    name = "Source treatment:"
  ) +
  theme_light() +
  theme(
    axis.text.x = element_blank(),
    axis.title.y = element_text(size = 13),
    axis.ticks = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    legend.title = element_text(size = 14),
    legend.text = element_text(size = 13),
    plot.title = element_text(size = 14, hjust = 0.5),
    plot.margin = margin(0.5, 0.5, 0, 0.5, "cm")
  ) +
  ylim(0, 6) +
  xlim(-0.5, 1.5) +
  ggplot2::annotate(
    geom = "text", x = 0.5, y = 5.5,
    label = paste("Interaction coef. = ", round(summary(h3a_lm)$coefficients[4, 1], 2), "\np = ", round(summary(h3a_lm)$coefficients[4, 4], 2), ", N = ", nobs(h3a_lm)),
    color = "#575757", size = 4
  )

# Sharing Email ####

data_plot <- data %>%
  group_by(treatment_congruence50, treatment_source) %>%
  select(share_report_1_email_num) %>%
  group_by(N = n(), add = TRUE) %>%
  summarize_all(funs(mean, var, sd, na_sum = sum(is.na(.))), na.rm = TRUE) %>%
  mutate(n = N - na_sum) %>%
  mutate(
    se = sqrt(var / n),
    ci.error = qt(0.975, df = n - 1) * sd / sqrt(n),
    conf.low = mean - ci.error,
    conf.high = mean + ci.error
  )

p2 <- ggplot(
  data_plot,
  aes(
    y = mean,
    x = treatment_source,
    group = as.factor(treatment_congruence50),
    color = as.factor(treatment_source),
    linetype = as.factor(treatment_congruence50)
  )
) +
  geom_point(position = position_dodge(0.6), size = 0.8) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
    width = .4,
    position = position_dodge(0.6), size = 0.3
  ) +
  labs(title = "(B) Sharing intention Email ", x = "", y = "Share Report 5 (0,1)") +
  scale_linetype_manual(
    values = c("dashed", "solid"),
    labels = c("Incongruent", "Congruent"),
    name = "Treatment:"
  ) +
  scale_color_manual(
    values = c("red", "blue"),
    labels = c("Nachrichten 360 (F)", "Tagesschau (R)"),
    name = "Treatment:"
  ) +
  theme_light() +
  theme(
    axis.text.x = element_blank(),
    axis.title.y = element_text(size = 13),
    axis.ticks = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    legend.position = "none",
    plot.title = element_text(size = 14, hjust = 0.5),
    plot.margin = margin(0.5, 0.25, -0.25, 0, "cm")
  ) +
  ylim(0, 1) +
  xlim(-0.5, 1.5) +
  ggplot2::annotate(
    geom = "text", x = 0.5, y = 0.85,
    label = paste("Interaction coef. = ", round(summary(h3b_email_lm)$coefficients[4, 1], 2), "\np = ", round(summary(h3b_email_lm)$coefficients[4, 4], 2), ", N = ", nobs(h3b_email_lm)),
    color = "#575757", size = 3.5
  )

# Facebook ####

data_plot <- data %>%
  group_by(treatment_congruence50, treatment_source) %>%
  select(share_report_1_fb_num) %>%
  group_by(N = n(), add = TRUE) %>%
  summarize_all(funs(mean, var, sd, na_sum = sum(is.na(.))), na.rm = TRUE) %>%
  mutate(n = N - na_sum) %>%
  mutate(
    se = sqrt(var / n),
    ci.error = qt(0.975, df = n - 1) * sd / sqrt(n),
    conf.low = mean - ci.error,
    conf.high = mean + ci.error
  )

p3 <- ggplot(
  data_plot,
  aes(
    y = mean,
    x = treatment_source,
    group = as.factor(treatment_congruence50),
    color = as.factor(treatment_source),
    linetype = as.factor(treatment_congruence50)
  )
) +
  geom_point(position = position_dodge(0.6), size = 0.8) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
    width = .4,
    position = position_dodge(0.6), size = 0.3
  ) +
  labs(title = "(C) Sharing intention Facebook ", x = "", y = "") +
  scale_linetype_manual(
    values = c("dashed", "solid"),
    labels = c("Incongruent", "Congruent"),
    name = "Treatment:"
  ) +
  scale_color_manual(
    values = c("red", "blue"),
    labels = c("Nachrichten 360 (F)", "Tagesschau (R)"),
    name = "Treatment:"
  ) +
  theme_light() +
  theme(
    axis.text.x = element_blank(),
    axis.title.y = element_text(size = 13),
    axis.ticks = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    legend.position = "none",
    plot.title = element_text(size = 14, hjust = 0.5),
    plot.margin = margin(0.5, 0.5, -0.25, -0.25, "cm")
  ) +
  ylim(0, 1) +
  xlim(-0.5, 1.5) +
  ggplot2::annotate(
    geom = "text", x = 0.5, y = 0.85,
    label = paste("Interaction coef. = ", round(summary(h3b_fb_lm)$coefficients[4, 1], 2), "\np = ", round(summary(h3b_fb_lm)$coefficients[4, 4], 2), ", N = ", nobs(h3b_fb_lm)),
    color = "#575757", size = 3
  )

# Twitter ####

data_plot <- data %>%
  group_by(treatment_congruence50, treatment_source) %>%
  select(share_report_1_twitter_num) %>%
  group_by(N = n(), add = TRUE) %>%
  summarize_all(funs(mean, var, sd, na_sum = sum(is.na(.))), na.rm = TRUE) %>%
  mutate(n = N - na_sum) %>%
  mutate(
    se = sqrt(var / n),
    ci.error = qt(0.975, df = n - 1) * sd / sqrt(n),
    conf.low = mean - ci.error,
    conf.high = mean + ci.error
  )

p4 <- ggplot(
  data_plot,
  aes(
    y = mean,
    x = treatment_source,
    group = as.factor(treatment_congruence50),
    color = as.factor(treatment_source),
    linetype = as.factor(treatment_congruence50)
  )
) +
  geom_point(position = position_dodge(0.6), size = 0.8) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
    width = .4,
    position = position_dodge(0.6), size = 0.3
  ) +
  labs(title = "(D) Sharing intention Twitter ", x = "", y = "Sharing Report 5 (0,1)") +
  scale_linetype_manual(
    values = c("dashed", "solid"),
    labels = c("Incongruent", "Congruent"),
    name = "Treatment:"
  ) +
  scale_color_manual(
    values = c("red", "blue"),
    labels = c("Nachrichten 360 (F)", "Tagesschau (R)"),
    name = "Treatment:"
  ) +
  theme_light() +
  theme(
    axis.text.x = element_blank(),
    axis.title.y = element_text(size = 13),
    axis.ticks = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    legend.position = "none",
    plot.title = element_text(size = 14, hjust = 0.5),
    plot.margin = margin(0.25, 0.25, 0, 0, "cm")
  ) +
  ylim(0, 1) +
  xlim(-0.5, 1.5) +
  ggplot2::annotate(
    geom = "text", x = 0.5, y = 0.85,
    label = paste("Interaction coef. = ", round(summary(h3b_twitter_lm)$coefficients[4, 1], 2), "\np = ", round(summary(h3b_twitter_lm)$coefficients[4, 4], 2), ", N = ", nobs(h3b_twitter_lm)),
    color = "#575757", size = 3.5
  )

# Whatsapp ####

data_plot <- data %>%
  group_by(treatment_congruence50, treatment_source) %>%
  select(share_report_1_whatsapp_num) %>%
  group_by(N = n(), add = TRUE) %>%
  summarize_all(funs(mean, var, sd, na_sum = sum(is.na(.))), na.rm = TRUE) %>%
  mutate(n = N - na_sum) %>%
  mutate(
    se = sqrt(var / n),
    ci.error = qt(0.975, df = n - 1) * sd / sqrt(n),
    conf.low = mean - ci.error,
    conf.high = mean + ci.error
  )

p5 <- ggplot(
  data_plot,
  aes(
    y = mean,
    x = treatment_source,
    group = as.factor(treatment_congruence50),
    color = as.factor(treatment_source),
    linetype = as.factor(treatment_congruence50)
  )
) +
  geom_point(position = position_dodge(0.6), size = 0.8) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
    width = .4,
    position = position_dodge(0.6), size = 0.3
  ) +
  labs(title = "(E) Sharing intention Whatsapp ", x = "", y = "") +
  scale_linetype_manual(
    values = c("dashed", "solid"),
    labels = c("Incongruent", "Congruent"),
    name = "Treatment:"
  ) +
  scale_color_manual(
    values = c("red", "blue"),
    labels = c("Nachrichten 360 (F)", "Tagesschau (R)"),
    name = "Treatment:"
  ) +
  theme_light() +
  theme(
    axis.text.x = element_blank(),
    axis.title.y = element_text(size = 13),
    axis.ticks = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    legend.position = "none",
    plot.title = element_text(size = 14, hjust = 0.5),
    plot.margin = margin(0.25, 0.5, 0, -0.25, "cm")
  ) +
  ylim(0, 1) +
  xlim(-0.5, 1.5) +
  ggplot2::annotate(
    geom = "text", x = 0.5, y = 0.85,
    label = paste("Interaction coef. = ", round(summary(h3b_whatsapp_lm)$coefficients[4, 1], 2), "\np = ", round(summary(h3b_whatsapp_lm)$coefficients[4, 4], 2), ", N = ", nobs(h3b_whatsapp_lm)),
    color = "#575757", size = 3.5
  )

graph_sharing <- ggarrange(p2, p3, p4, p5,
  ncol = 2,
  nrow = 2
)

graph_combined <- ggarrange(p1, graph_sharing,
  common.legend = TRUE,
  legend = "bottom",
  ncol = 2,
  widths = c(2.5, 3)
)


annotate_figure(graph_combined,
    bottom = text_grob("Note: Plots show point estimates and 95% confidence intervals for average belief, and sharing intentions, per source by congruence treatment group.\nOLS coefficients and p-values for the interaction between the two treatments listed in the top part of the plots.", color = "black",
                                  hjust = 0, x = 0, size = 11)
)
```


```{r fig-6, echo=FALSE, fig.cap="Source credibility development\\label{fig-6}", fig.height=5, fig.width=9, message=FALSE, warning=FALSE, out.extra=''}

credibility_development <- data %>%
  group_by(treatment_congruence50, treatment_source) %>%
  select(
    belief_report_1_num,
    belief_report_5_num
  ) %>%
  group_by(N = n(), add = TRUE) %>%
  gather(
    key = "variable", value = "value",
    c("belief_report_1_num", "belief_report_5_num")
  ) %>%
  group_by(treatment_congruence50, treatment_source, variable) %>%
  summarize(
    N = mean(N),
    mean = mean(value, na.rm = TRUE),
    var = var(value, na.rm = TRUE),
    sd = sd(value, na.rm = TRUE),
    na_sum = sum(is.na(value))
  ) %>%
  mutate(n = N - na_sum) %>%
  mutate(
    se = sqrt(var / n),
    ci_error = qt(0.975, df = n - 1) * sd / sqrt(n),
    conf_low = mean - ci_error,
    conf_high = mean + ci_error
  ) %>%
  ungroup() %>%
  mutate(
    treatment_source =
      case_when(
        treatment_source == 0 ~ "Nachrichten 360",
        treatment_source == 1 ~ "Tagesschau"
      )
  ) %>%
  mutate(treatment_source = factor(treatment_source,
    ordered = TRUE,
    levels = c(
      "Nachrichten 360",
      "Tagesschau"
    )
  )) %>%
  mutate(
    treatment_congruence50 =
      case_when(
        treatment_congruence50 == 0 ~ "Incongruent",
        treatment_congruence50 == 1 ~ "Congruent"
      )
  ) %>%
  mutate(treatment_congruence50 = factor(treatment_congruence50,
    ordered = TRUE,
    levels = c(
      "Incongruent",
      "Congruent"
    )
  )) %>%
  mutate(variable = case_when(
    variable == "belief_report_1_num" ~ 1,
    variable == "belief_report_5_num" ~ 5
  )) %>%
  rename(
    "Groups_Congruence_treatment" = treatment_congruence50,
    "Groups_Source_treatment" = treatment_source
  )

p <- ggplot(
  credibility_development,
  aes(
    x = variable,
    y = mean,
    group = interaction(Groups_Source_treatment, Groups_Congruence_treatment),
    color = Groups_Source_treatment,
    linetype = Groups_Congruence_treatment
  )
) +
  geom_point(size = 1) +
  geom_path() +
  geom_errorbar(aes(ymin = conf_low, ymax = conf_high),
    width = .1,
  ) +
  ylim(0, 6) +
  xlim(0.75, 5.25) +
  labs(y = "Belief (0-6)", x = "") +
  theme_classic() +
  scale_color_manual(
    name = "Treatment: Source",
    values = c("red", "blue")
  ) +
  scale_linetype_manual(
    name = "Treatment: Congruence",
    values = c("dashed", "solid")
  ) +
  scale_x_continuous(
    breaks = c(1, 5),
    minor_breaks = c(2, 3, 4),
    labels = c("Report 1", "Report 5")
  ) +
  theme_light() +
  theme(
    axis.title.y = element_text(size = 14),
    axis.text.x = element_text(size = 14),
    legend.title = element_text(size = 14),
    legend.text = element_text(size = 13),
    plot.title = element_text(size = 15, hjust = 0.5),
    plot.margin = margin(0, 0, 0, 0.5, "cm")
  ) +
  ggplot2::annotate(
    geom = "text", x = 3, y = 0.5,
    label = "Reports 2-4: Content manipulated, \nbelief not measured.",
    color = "gray",
    size = 4
  )


annotate_figure(p,
  bottom = text_grob("Note: Plots show point estimates and 95% confidence intervals for average belief per treatment group for both Report 1 and Report 5.\nLines connect each of four treatment groups over time.",
    color = "black",
    hjust = 0, x = 0, size = 11
  )
)
```



## Treatment heterogeneity {#sec:resultsheterogeneity}

We now turn to the question of treatment heterogeneity for the two treatment dimensions, i.e., source and congruence. We estimate heterogeneity using the causal forest method [cf. @Wager2018-tk]. An elaborate description of this approach can be found in Appendix \@ref(sec:thoverview). Results for the belief outcome are described in this section; results for sharing outcomes can be found in Appendix \@ref(sec:resultsheterogeneitysharing). 

Following @AtheyWager2019, we generally start by growing a pilot causal forest based on all covariates described in our methods section. The variables included are gender, age, state of residence (coded as dummy for Eastern Germany), mainstream media knowledge, mainstream media trust, political knowledge, overconfidence, use of email/Facebook/Twitter/Whatsapp (coded as dummies), sharing frequency, education, turnout at last election, vote choice at last election (coded as party dummies), income, and basic programming knowledge. We then regrow the causal forest including only the variables with above-average importance in the pilot forest [cf. @AtheyWager2019].^[Importance is calculated as a simple weighted sum of how many times a variable was split on at each depth in the forest. Importance statistics for all variables can be found in Appendix \@ref(sec:treatmentheterogeneity).] For this final forest, we predict individual-level treatment effects, i.e., the difference between being exposed to one treatment compared to the other, with out-of-bag prediction.

```{r causal-forest-settings, include = FALSE}

# Set causal forest parameters

set.seed(1839)
set_seed <- 1839
number_of_trees <- 30000
tree_depth <- 4

# Set covariates used for growing forest

all.idx <- c(
  "sex_num",
  "age",
  "federal_state_west",
  "education_num",
  "income_num",
  "turnout_num",
  "vote_choice_leftparty",
  "vote_choice_greens",
  "vote_choice_spd",
  "vote_choice_cdu_csu",
  "vote_choice_fdp",
  "vote_choice_afd_num",
  "vote_choice_dont_know",
  "know_html_correct",
  "know_source_mainstream",
  "sharing_frequency",
  "trust_source_mainstream",
  "use_email_num",
  "use_fb_num",
  "use_twitter_num",
  "use_whatsapp_num",
  "know_politicians_total",
  "overconfidence"
)

continuous_vars <- c(
  "trust_source_mainstream", "know_source_mainstream",
  "age", "overconfidence", "sharing_frequency", "know_politicians_total"
)
```

```{r fig-7, echo=FALSE, fig.cap="Heterogeneity of predicted source treatment effect on belief\\label{fig-7}", fig.height=5, fig.width=7, message=FALSE, warning=FALSE, out.extra=''}

# Pilot forest

## Subset data

data_heterogeneity <-
  data %>%
  select(
    belief_report_1_num,
    treatment_source,
    all.idx
  ) %>%
  na.omit() %>%
  data.frame()

## Define treatment, outcome and covariates

X <- data_heterogeneity %>% select(-belief_report_1_num, -treatment_source)
W <- data_heterogeneity$treatment_source
Y <- data_heterogeneity$belief_report_1_num

## Estimate pilot forest

tau.forest <- causal_forest(
  X = X,
  Y = Y,
  W = W,
  num.trees = number_of_trees
)

## Variable Importance pilot forest

variable_importance <- tau.forest %>%
  variable_importance(max.depth = tree_depth) %>%
  as.data.frame() %>%
  mutate(variable = colnames(tau.forest$X.orig)) %>%
  arrange(desc(V1))

## Select variables with importance higher than average

selected.idx <- variable_importance %>%
  filter(V1 > mean(V1)) %>%
  select(variable) %>%
  pull()

# Efficient forest with most important variables ####

## Subset data

data_heterogeneity <-
  data %>%
  select(
    belief_report_1_num,
    treatment_source,
    selected.idx
  ) %>%
  na.omit() %>%
  data.frame()

## Define treatment, outcome and covariates

X <- data_heterogeneity %>% select(-belief_report_1_num, -treatment_source)
W <- data_heterogeneity$treatment_source
Y <- data_heterogeneity$belief_report_1_num

## Estimate forest

tau.forest <- causal_forest(
  X = X,
  Y = Y,
  W = W,
  num.trees = number_of_trees
)

## Recalculate importance

variable_importance <- tau.forest %>%
  variable_importance(max.depth = tree_depth) %>%
  as.data.frame() %>%
  mutate(variable = colnames(tau.forest$X.orig)) %>%
  arrange(desc(V1))

## Predict individual-level treatment effects

tau.hat <- predict(tau.forest, estimate.variance = TRUE)

# Omnibus tests for heterogeneity ####

## Compare regions with high and low predictions

high_effect <- tau.hat[, 1] > median(tau.hat[, 1])
ate.high <- average_treatment_effect(tau.forest, subset = high_effect)
ate.low <- average_treatment_effect(tau.forest, subset = !high_effect)
regions_source_belief_diff <- round(ate.high[1] - ate.low[1], 2)
regions_source_belief_se <- round(sqrt(ate.high[2]^2 + ate.low[2]^2), 2)

## Best linear predictor (calibration test)

calibr_source_belief <- test_calibration(tau.forest)
calibr_source_belief_mean <- format(round(calibr_source_belief[1], 2), nsmall = 2)
calibr_source_belief_diff <- format(round(calibr_source_belief[2], 2), nsmall = 2)
calibr_source_belief_diff_p <- format(round(calibr_source_belief[8], 3), nsmall = 3)

# Extend variable importance df for Appendix table and plots ####

variable_importance <- variable_importance %>%
  mutate(
    Covariate =
      recode(variable,
        "trust_source_mainstream" = "Mainstr. media trust",
        "age" = "Age",
        "know_politicians_total" = "Political knowledge",
        "income_num" = "Income",
        "education_num" = "Education",
        "know_source_mainstream" = "Mainstr. media knowl.",
        "overconfidence" = "Overconfidence",
        "know_html_correct" = "Knowing html",
        "sex_num" = "Gender",
        "federal_state_west" = "West Germany",
        "use_email_num" = "Email usage",
        "use_fb_num" = "Facebook use",
        "use_twitter_num" = "Twitter usage",
        "use_whatsapp_num" = "Whatsapp use",
        "sharing_frequency" = "Sharing frequency",
        "turnout_num" = "Vote participation",
        "(Intercept)" = "Intercept",
        "vote_choice_leftparty" = "Vote choice LeftParty",
        "vote_choice_greens" = "Vote choice Greens",
        "vote_choice_spd" = "Vote choice SPD",
        "vote_choice_cdu_csu" = "Vote choice CDU/CSU",
        "vote_choice_fdp" = "Vote choice FDP",
        "vote_choice_afd_num" = "Vote choice AfD",
        "vote_choice_dont_know" = "Vote choice Dont know"
      )
  ) %>%
  mutate(Outcome = "Belief Report 1") %>%
  mutate(Treatment = "Source treatment") %>%
  rename(Importance = V1) %>%
  mutate(type = ifelse(variable %in% continuous_vars, "continuous", "categorical"))

## Variable importance table for Appendix

var_imp_source_belief <- variable_importance %>%
  select(Treatment, Outcome, Covariate, Importance)

# Best linear projection test (blpht) ####

blpht_source_belief <- best_linear_projection(tau.forest, X)
blpht_source_belief_dt <- as.data.frame(as.table(blpht_source_belief)) %>%
  pivot_wider(names_from = "Var2", values_from = "Freq") %>%
  rename("variable" = "Var1")

# Plots per covariate ####

## Bind data with predictions

data_heterogeneity <- bind_cols(data_heterogeneity, tau.hat)

## Plots loop

for (i in 1:nrow(variable_importance)) {
  var <- variable_importance$variable[i]
  var_label <- variable_importance$Covariate[i]
  plot_number <- LETTERS[seq(from = 1, to = nrow(variable_importance))][i]

  # Define variables that need diagonal labels
  if (var %in% c("income_num", "education_num")) {
    angle <- 45
  } else {
    angle <- 0
  }

  data_plot <- data_heterogeneity %>% select(var, predictions)

  if (variable_importance$type[i] == "continuous") {
    p <- ggplot(data_plot, aes_string(
      x = as.name(var),
      y = as.name("predictions")
    )) +
      geom_point(alpha = 3 / 10) +
      geom_smooth(method = "loess", span = 1, se = F, colour = "gray") +
      labs(title = paste0("(", plot_number, ") ", var_label)) +
      theme_light() +
      theme(
        axis.text.x = element_text(size = 6, angle = angle),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        plot.title = element_text(size = 8)
      )
  } else {
    data_plot[, var] <- factor(round(data_plot[, var]))

    labels <- data %>%
      select(gsub("_num", "_fac", var)) %>%
      table() %>%
      rownames()

    p <- ggplot(data_plot, aes_string(
      x = as.name(var),
      y = as.name("predictions")
    )) +
      geom_boxplot() +
      geom_smooth(method = "loess", se = FALSE, aes(group = 1), colour = "gray") +
      labs(title = paste0("(", plot_number, ") ", var_label)) +
      scale_x_discrete(labels = labels) +
      theme_light() +
      theme(
        axis.text.x = element_text(
          size = 6, angle = angle,
          hjust = 1, vjust = 1
        ),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        plot.title = element_text(size = 8)
      )
  }

  assign(paste("p", i, sep = ""), p)
}

# plots_combined <- grid.arrange(arrangeGrob(p1, p2, p3, p4, p5, p6, p7, p8, ncol = 4),
#   left = grid::textGrob("Predicted source treatment effect",
#     rot = 90, vjust = 1
#   )
# )
# 
# 
# annotate_figure(plots_combined,
#   bottom = text_grob("Note: Plots display variation of predicted individual treatment effects across values of continuous\n(scatter plots) and categorical (box plots) covariates (the 8 most important in terms of treatment heterogeneity).\nThe gray curve is a simple smoothing curve.",
#     color = "black",
#     hjust = 0, x = 0, size = 9
#   )
# )

plots_combined2 <- ggarrange(p1, p2, p3, p4, p5, p6, p7, p8,
  common.legend = TRUE,
  legend = "bottom",
  ncol = 4,
  nrow = 2
)


# Annotate the figure by adding a common labels
annotate_figure(plots_combined2,
  left = text_grob("Predicted source treatment effect", color = "black", rot = 90),
  bottom = text_grob("Note: Plots display variation of predicted individual treatment effects across values of continuous\n(scatter plots) and categorical (box plots) covariates (the 8 most important in terms of treatment heterogeneity).\nThe gray curve is a simple smoothing curve.",
    color = "black",
    hjust = 0, x = 0, size = 9
  )
)





tau.forest.example.tree <- tau.forest # Save to built example tree later on
```

The resulting causal forest for the source treatment is built with N = `r nrow(data_heterogeneity)` observations, on eight covariates. Before discussing the significance of each covariate, we report the results of two omnibus tests whether heterogeneity is generally present or not. Again following @AtheyWager2019, we first compare regions with high and low estimates of individual treatment effects, separated by the median. In the absence of heterogeneity, we should not find large differences across these regions. In our case, there is a significant difference of `r regions_source_belief_diff` (SE = `r regions_source_belief_se`) between the averages of these two region. A second omnibus test relies on the best linear prediction method that fits the the conditional average treatment effect as a linear
function of the the causal forest estimates. The significant coefficients (p =  `r calibr_source_belief_diff_p` suggest that heterogeneity is present (see Appendix \@ref(sec:treatmentheterogeneity) for details). 

Figure \@ref(fig-7) visualizes heterogeneity by plotting predictions against the set of eight covariates. Recall that the average treatment effect of the source treatment on belief was `r source_effect_belief`. This effect varies considerably with a few covariates. For example, Panel A shows that, for subjects expressing high trust in mainstream media, the predicted difference between real and fake source is about 0.75 points higher than for those with low mainstream media trust. Panel B shows that the real source made a greater impression on those who did not vote for the right-wing populist party AfD. The remaining plots suggest that the source effect is weakest for middle-aged people (C), for those least politically knowledgeable (D), for those who know few mainstream sources (E), for those with lower incomes (F), and for those overly confident of their political knowledge (G). The treatment effect does not seem to vary much with individual sharing frequency. To test whether these relations are significant, we run a best linear projection of predictions on the set of covariates [cf. @Tibshirani2020-uw]. It shows that only three variables are statistically significant at the 95% level: Mainstream media trust (p = `r round(as.numeric(blpht_source_belief_dt[blpht_source_belief_dt$variable == "trust_source_mainstream", "Pr(>|t|)"]), 3)`), vote for the populist right (p = `r round(as.numeric(blpht_source_belief_dt[blpht_source_belief_dt$variable == "vote_choice_afd_num", "Pr(>|t|)"]), 3)`), and political knowledge (p = `r round(as.numeric(blpht_source_belief_dt[blpht_source_belief_dt$variable == "know_politicians_total", "Pr(>|t|)"]), 3)`). Age and media knowledge show marginal significance levels (see Table \@ref(tab:tab-A27) in the Appendix for more details).  



```{r fig-8, echo=FALSE, fig.cap="Heterogeneity of predicted congruence treatment effect on belief\\label{fig-8}", fig.height=5, fig.width=7, message=FALSE, warning=FALSE, out.extra=''}

# Pilot forest

## Subset data

data_heterogeneity <-
  data %>%
  filter(treatment_source == 0) %>% # Filter out 'real-source treatment'
  select(
    belief_report_5_num,
    treatment_congruence50,
    all.idx
  ) %>%
  na.omit() %>%
  data.frame()

## Define treatment, outcome and covariates

X <- data_heterogeneity %>% select(-belief_report_5_num, -treatment_congruence50)
W <- data_heterogeneity$treatment_congruence50
Y <- data_heterogeneity$belief_report_5_num

## Estimate pilot forest

tau.forest <- causal_forest(
  X = X,
  Y = Y,
  W = W,
  num.trees = number_of_trees
)

## Variable Importance pilot forest

variable_importance <- tau.forest %>%
  variable_importance(max.depth = tree_depth) %>%
  as.data.frame() %>%
  mutate(variable = colnames(tau.forest$X.orig)) %>%
  arrange(desc(V1))

## Select variables with importance higher than average

selected.idx <- variable_importance %>%
  filter(V1 > mean(V1)) %>%
  select(variable) %>%
  pull()

# Efficient forest with most important variables ####

## Subset data

data_heterogeneity <-
  data %>%
  filter(treatment_source == 0) %>% # Filter out real source treatment
  select(
    belief_report_5_num,
    treatment_congruence50,
    selected.idx
  ) %>%
  na.omit() %>%
  data.frame()

## Define treatment, outcome and covariates

X <- data_heterogeneity %>% select(-belief_report_5_num, -treatment_congruence50)
W <- data_heterogeneity$treatment_congruence50
Y <- data_heterogeneity$belief_report_5_num

## Estimate forest

tau.forest <- causal_forest(
  X = X,
  Y = Y,
  W = W,
  num.trees = number_of_trees
)

## Recalculate importance

variable_importance <- tau.forest %>%
  variable_importance(max.depth = tree_depth) %>%
  as.data.frame() %>%
  mutate(variable = colnames(tau.forest$X.orig)) %>%
  arrange(desc(V1))

## Predict individual-level treatment effects

tau.hat <- predict(tau.forest, estimate.variance = TRUE)

# Omnibus tests for heterogeneity ####

## Compare regions with high and low predictions

high_effect <- tau.hat[, 1] > median(tau.hat[, 1])
ate.high <- average_treatment_effect(tau.forest, subset = high_effect)
ate.low <- average_treatment_effect(tau.forest, subset = !high_effect)
regions_congruence_belief_diff <- round(ate.high[1] - ate.low[1], 3)
regions_congruence_belief_se <- round(sqrt(ate.high[2]^2 + ate.low[2]^2), 3)

## Best linear predictor (calibration test)

calibr_congruence_belief <- test_calibration(tau.forest)
calibr_congruence_belief_mean <- format(round(calibr_congruence_belief[1], 2), nsmall = 2)
calibr_congruence_belief_diff <- format(round(calibr_congruence_belief[2], 2), nsmall = 2)
calibr_congruence_belief_diff_p <- format(round(calibr_congruence_belief[8], 3), nsmall = 3)

# Extend variable importance df for Appendix table and plots ####

variable_importance <- variable_importance %>%
  mutate(
    Covariate =
      recode(variable,
        "trust_source_mainstream" = "Mainstream media trust",
        "age" = "Age",
        "know_politicians_total" = "Political knowledge",
        "income_num" = "Income",
        "education_num" = "Education",
        "know_source_mainstream" = "Mainstream media knowledge",
        "overconfidence" = "Overconfidence",
        "know_html_correct" = "Knowing html",
        "sex_num" = "Gender",
        "federal_state_west" = "West Germany",
        "use_email_num" = "Email usage",
        "use_fb_num" = "Facebook use",
        "use_twitter_num" = "Twitter usage",
        "use_whatsapp_num" = "Whatsapp use",
        "sharing_frequency" = "Sharing frequency",
        "turnout_num" = "Vote participation",
        "(Intercept)" = "Intercept",
        "vote_choice_leftparty" = "Vote choice LeftParty",
        "vote_choice_greens" = "Vote choice Greens",
        "vote_choice_spd" = "Vote choice SPD",
        "vote_choice_cdu_csu" = "Vote choice CDU/CSU",
        "vote_choice_fdp" = "Vote choice FDP",
        "vote_choice_afd_num" = "Vote choice AfD",
        "vote_choice_dont_know" = "Vote choice Dont know"
      )
  ) %>%
  mutate(Outcome = "Belief Report 5") %>%
  mutate(Treatment = "Congruence treatment") %>%
  rename(Importance = V1) %>%
  mutate(type = ifelse(variable %in% continuous_vars, "continuous", "categorical"))

## Variable importance table for Appendix

var_imp_congruence_belief <- variable_importance %>%
  select(Treatment, Outcome, Covariate, Importance)

# Best linear projection heterogeneity test (blpht) ####

blpht_congruence_belief <- best_linear_projection(tau.forest, X)
blpht_congruence_belief_dt <- as.data.frame(as.table(blpht_congruence_belief)) %>%
  pivot_wider(names_from = "Var2", values_from = "Freq") %>%
  rename("variable" = "Var1")

# Plots per covariate ####

## Bind data with predictions

data_heterogeneity <- bind_cols(data_heterogeneity, tau.hat)

## Plots loop

for (i in 1:nrow(variable_importance)) {
  var <- variable_importance$variable[i]
  var_label <- variable_importance$Covariate[i]
  plot_number <- LETTERS[seq(from = 1, to = nrow(variable_importance))][i]

  # Define variables that need diagonal labels
  if (var %in% c("income_num", "education_num")) {
    angle <- 45
  } else {
    angle <- 0
  }

  data_plot <- data_heterogeneity %>% select(var, predictions)

  if (variable_importance$type[i] == "continuous") {
    p <- ggplot(data_plot, aes_string(
      x = as.name(var),
      y = as.name("predictions")
    )) +
      geom_point(alpha = 3 / 10) +
      geom_smooth(method = "loess", span = 1, se = F, colour = "gray") +
      labs(title = paste0("(", plot_number, ") ", var_label)) +
      theme_light() +
      theme(
        axis.text.x = element_text(size = 6, angle = angle),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        plot.title = element_text(size = 8)
      )
  } else {
    data_plot[, var] <- factor(round(data_plot[, var]))

    labels <- data %>%
      select(gsub("_num", "_fac", var)) %>%
      table() %>%
      rownames()

    p <- ggplot(data_plot, aes_string(
      x = as.name(var),
      y = as.name("predictions")
    )) +
      geom_boxplot() +
      geom_smooth(method = "loess", se = FALSE, aes(group = 1), colour = "gray") +
      labs(title = paste0("(", plot_number, ") ", var_label)) +
      scale_x_discrete(labels = labels) +
      theme_light() +
      theme(
        axis.text.x = element_text(
          size = 6, angle = angle,
          hjust = 1, vjust = 1
        ),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        plot.title = element_text(size = 8)
      )
  }

  assign(paste("p", i, sep = ""), p)
}




plots_combined <- ggarrange(p1, p2, p3, p4, p5, p6, p7, p8,
  common.legend = TRUE,
  legend = "bottom",
  ncol = 4,
  nrow = 2
)


# Annotate the figure by adding a common labels
annotate_figure(plots_combined,
  left = text_grob("Predicted congruence treatment", color = "black", rot = 90),
  bottom = text_grob("Note: Plots display variation of predicted individual treatment effects across values of continuous\n(scatter plots) and categorical (box plots) covariates (the eight most important in terms of treatment heterogeneity).\nThe gray curve is a simple smoothing curve.",
    color = "black",
    hjust = 0, x = 0, size = 9
  )
)


```

To test treatment heterogeneity for the congruence effect, we restricted the data to subjects in the fake source treatment, as it is here that we found a congruence effect. We grew a causal forest as described above, based on N = `r nrow(data_heterogeneity)` observations and, again, eight covariates with above-average importance. The set of most important covariates is similar to those important for the source effect. The two omnibus tests described above suggest that the heterogeneity found is real: the average difference between the high-prediction region and the low-prediction region is `r regions_congruence_belief_diff` (SE = `r regions_congruence_belief_se`). The calibration test reveals a significant coefficient  (p = `r calibr_congruence_belief_diff_p`). 

Figure \@ref(fig-8) visualizes heterogeneity of the congruence effect against the included covariates. Recall that the average treatment effect was `r fakesource_congruence_effect`. For example, Plot (B) suggests that the congruence effect is larger for middle-aged people. The other plots suggest substantial heterogeneity for vote for the populist right, income, political knowledge, and education. Again, we use the best linear projection method to test whether covariates significantly predict heterogeneity. Only having voted for the populist right turns out to be statistically significant at the 95%-level (p = `r round(as.numeric(blpht_congruence_belief_dt[blpht_congruence_belief_dt$variable == "vote_choice_afd_num", "Pr(>|t|)"]), 3)`, see Table \@ref(tab:tab-A27) in the Appendix for more details). 



These results imply some interesting patterns: by and large, there is an inverse relation between the heterogeneity of the source effect and of the congruence effect. For example, those not having voted for the right-wing populist AfD give comparably greater credit to the real source than to the fake source, but they are less influenced by a source providing them congruent information. A similar conclusion could be drawn about mainstream source trust, political knowledge, and income, although we have less confidence about congruence effect heterogeneity on these covariates.

We also explored the heterogeneity of both treatments for the sharing outcomes. However, both omnibus tests do not allow us to reject the hypothesis of no heterogeneity, so that we do not further explore the significance of individual covariates. This somewhat reflects the much less clear-cut findings for sharing in the main analyses and could potentially also be due to the fact that our samples become small in the high-dimensional space defined by our covariates, as discussed below. See Appendix \@ref(sec:resultsheterogeneitysharing) for more details. 

# Discussion and conclusion {#sec:conclusion}

With the advent of the Internet and social media, the number of sources for political information has multiplied. A particular worrying aspect is that malevolent actors can —and do—easily invent sources that present themselves as legitimate online news organizations. In a novel survey experiment, we investigate how source characteristics and content affect people’s belief that the factual claims contained in a news report are true and their intention to share the report. In line with our first hypothesis, we find that individuals have a stronger belief in the report by a real than by a fake source, as well as a higher intention to share it, at least via Facebook and Whatsapp. However, the gap between the real source and the fake source is not particularly large. Additional evidence shown in the Appendix suggests that the naming alone of our made-up source made a difference. A worrying implication of our finding is that people might be easily fooled by malevolent actors who get such tiny signals right. 

Our key finding concerns the role of attitudinal congruence of content. Being exposed to a series of reports that present facts congruent (rather than incongruent) with one’s attitudes increases belief in and Facebook sharing of a subsequent report of that same source. Interestingly, we find this congruence effect exclusively for the fake source. Hence, while belief in a real source with an arguably widely known reputation seems less malleable, fake sources may manage to build up credibility through catering to readers’ world views. Given that this effect could be seen after only three news reports in our experiment, the potential for fake news entrepreneurs or hyperpartisan sources to win people’s trust might be substantial—whether through selective reporting or outright misinformation. 

Furthermore, we show that the effects on belief vary across subpopulations. The positive effect of obtaining information from our real source on belief seems to be moderated by distrust of the mainstream media, vote intention for the populist right, and political knowledge. This parallels other recent evidence that people low on media trust are more susceptible to disinformation [@Zimmermann2020-ud]. Similarly, the effect of congruence varies in particular across vote intention for the populist right. Considering the anti-elitist dimension of populism [e.g. @Schulzetal2018], it is not surprising that voters of the populist right seem to quickly develop trust in a "new" and ideologically convenient source.

Several details and limitations of our experiment warrant discussion and open avenues for future research. First, our findings suggest belief and sharing are distinct outcomes. As far as the relationship between the two variables is concerned, we find relatively low correlations between belief and sharing intention (see Figure \@ref(fig-A6)). We suggest that sharing is explicable only to a small extent by belief, as implied by other recent studies [@Pennycooketal2019b; @Bright2016-os]. To what extent sharing is caused by belief should be answered by designs that, in contrast to our study, causally identify belief as an independent variable, and separate it from other factors [e.g., @Valenzuela2017-lb; @Cappella2015-kq]. Importantly, sharing decisions are affected by various factors that may be less important for belief, such as an item’s newsworthiness, informational utility, valence, and framing. For example, recent studies have shown that content with moral frames and language are more likely to be shared [@Valenzuela2017-lb]. This could be one reason why our results for belief and sharing intentions differ. In our study, we only create factual differences between treatment stimuli. Future studies should combine manipulations of facts with manipulations of other content-level variables such as the framing of information. 

The lack of effects on sharing might also be due to statistical power. Our sample was recruited according to nationally representative quotas. This inevitably meant including many subjects who do not use some of the social media services. As we only measured sharing outcomes for users of the respective platforms, sample sizes for the analyses of sharing intention were much smaller. This was especially the case for Twitter. However, we note that the findings for Facebook seem robust despite a smaller sample size. Future work should collect larger samples of social media users (or oversample them) to have enough power to identify small effects.

Second, our design was not focused on identifying the psychological mechanisms, i.e., the thought processes underlying belief or sharing. Rather, we approached the concept of source credibility from the perspective of the source. Given that anyone can quickly set up what appears to be a news source through a web site or a Facebook page, we asked to what extent people make a difference between a real source and a fake source imitating a credible outlet. This approach meant that we did not delve into perceptions about the sources. For example, it could be that subjects thought the made-up source was a legitimate news organization, but unknown to them; alternatively, they may have assumed it was a site potentially peddling misinformation. Although we ultimately can not know how participants perceived the sources at the experimental stage, our data suggests that subjects in our experiment judged the made-up source as unfamiliar but to some extent credible. Pretreatment measures of familiarity and trust over a range of sources suggest that even though most subjects rightly said they did not know the made-up source, they still rated it as relatively trustworthy (e.g., better than the tabloid *Bild*; see Figures \@ref(fig-A2) and \@ref(fig-A3)). Slight differences in the name of a source may have strong effects, as implied by the difference in trust in the made-up sources *Nachrichten 360* (used in our experiment) and *Berliner Nachrichten* (cf. Figure \@ref(fig-A3) in the Appendix). To gain a better understanding of the psychological underpinnings that cause such differences, future studies should further explore the experimental variation of sources’ names, appearance, and content in a combination with open-ended probing questions. Eye tracking could provide a fruitful way to measure to what extent individuals perceive and discern different signals [@Sulflow2019-kt].

Third, our choice of the topic---immigration---warrants discussion. While we are confident that there was nothing extraordinarily controversial about the topic at the particular time of the study (see our explorations of Google Trends in Appendix \@ref(sec:saliency)), immigration has been a relatively salient and controversial topic in recent years. We cannot say with certainty that our findings---especially that a source gains credibility when it provides congruent information---would show for other topics. However, we came to our theoretical prediction on the basis of motivated reasoning theory. If people have a tendency to defend their pre-existing attitudes, a source helping them to do so will gain credibility in their view. Many scholars have argued that motivated reasoning should occur in particular for issues of ideological conflict [e.g. @Taberetal2009; @Slothuus2010-wp; @Kahan2016a]. This seems to suggest that our findings apply to  other controversial issues. Nonetheless, future work should test the scope of our findings. 

Finally, in our experiment we neglected individuals’ choices of both news sources and content. The  interplay of source selection with media trust [@Tsfati2005-va; @Fletcher2017-nl] as well as with extreme attitudes [@GentzkowShapiro2011; @Bakshyetal2015; @Flaxmanetal2016; @Eady2019-uk] is complicated. Individual choice on social media platforms also interacts with a platform’s technology. Accordingly, whether the effects studied are reduced or even magnified by individual choice is a central question for future research.

Our findings have implications for the spread of misinformation on social media. Platforms play a key role in directing users to fake-news websites [@Guess2020-kz]. Relying on the mechanism we identified, such websites may try to exploit recommendation algorithms (that show users what they like) to build a “relationship” of trust with their readers and viewers. Therefore, once a source has been identified as questionable by a platform, further contact between source and user should be cut off. The source should also be prevented from initiating contact with further users. Once a source--user relationship continues outsides of the platform, it is too late for platforms to react. Our findings also imply that small signals such as a source’s name have a strong impact. We motivated our study with the example of the “Denver Guardian”, a scam outlet ostensibly playing on existing trusted news brands and familiar places. Platforms’ filtering algorithms should be adapted to detect the use of such tiny signals as well as more abstract strategies underlying them. Our results support recent attempts by platforms to integrate source- and network-level characteristics into automated detection mechanisms [@Gereme2019-fe]. In addition, platforms should educate users as to how sources may fool them into believing and sharing misinformation. Making users aware of the psychological mechanisms that increase their vulnerability, possibly through playful demonstrations of how to spot misinformation, could help [@Guess2020-er]. In a battle between automated detection tools and misinformation entrepreneurs that tailor their strategies, platforms have to develop solutions that are adaptive and evolve over time.

\clearpage

# References

\linespread{1}


<div id="refs_main"></div>



\linespread{1.6}


\clearpage


<!--- appendix split -->

\appendix
\addtocontents{toc}{\protect\setcounter{tocdepth}{2}}




\renewcommand{\thesection}{A}

\setcounter{page}{1}

\setcounter{table}{0}
\renewcommand{\thetable}{A\arabic{table}}
\renewcommand{\figurename}{Table}

\setcounter{figure}{0}
\renewcommand\thefigure{A\arabic{figure}}
\renewcommand{\figurename}{Figure}


\clearpage
\pagenumbering{gobble}

\vspace*{3cm}

\begin{center}
\begin{huge}
Online Appendix
\end{huge}
\end{center}
\vspace{3cm}

\begin{center}
\begin{huge}
Believing and Sharing Information by Fake Sources: An Experiment
\end{huge}
\end{center}
\vspace{3cm}

\clearpage
\pagenumbering{arabic} 
\vspace*{2cm}
\tableofcontents

\clearpage



## Measures: Wording and coding of survey questions {#sec:questions}

The table below outlines question wording (translated from German) and coding decisions. All coding is made transparent in the R Markdown file accompanying the study.

\linespread{1}

\scriptsize


| Variable | Question | Choices/Coding  |
|----------|------------------------|-------------|
| Age | "How old are you? "  | Drop-down input  |
| Gender | "Which gender do you have?"   | "Female"/"Male"  |
| State of residence | "In which federal state do you live?"  | Choice among 16 federal states |
| Source knowledge | "Do you recognize the following media? " | "Yes"/"No" battery for 11 news sources in random order; average index calculated for 7 mainstream sources |
| Source reading/watching | "And have you, whether offline or offline, ever read or viewed news of the respective medium?"  | "Yes"/"No" battery for 11 news sources in random order; Asked only news sources the respondents indicated recognizing |
| Source trust | "Even if you don't know all of them: Do you think you can trust the following media?"  | Labelled 5-point scale: "Not at all", "rather not", "partly", "rather", "completely"; average index calculated for 7 mainstream sources   |
| Immigration culture attitude  | "Would you say that cultural life in Germany is generally undermined or enriched by immigrants? 0 means undermined, 10 means enriched."  | 11-point scale with labelled end points; average index calculated together with other immigration attitude items |
| Immigration economy attitude  | "Would you say that it is generally bad or good for the German economy that immigrants come here? 0 means bad, 10 means good." | 11-point scale with labelled end points  |
| Immigration security attitude | "Would you say that Germany becomes less safe or safer with immigrants? 0 means less safe and 10 means safer."  | 11-point scale with labelled end point s |
| Immigration life attitude | "Would you say that immigrants make Germany a worse or better place to live? 0 means worse and 10 means better."  | 11-point scale with labelled end points  |
| Immigration border attitude  | "Would you say that Europe sufficiently protects its external borders against illegal immigration? 0 means protection is not sufficient, 10 means protection is sufficient." | 11-point scale with labelled end points  |
| Platform use | "Do you have an Email/Facebook/Twitter account? Do you have Whatsapp installed?" | Choice between Yes/Yes, but I don't use it/No |
| Sharing frequency | "How often do you spread news reports via Email/Facebook/Twitter/Whatsapp? "| Choice between "Less"/"Several times a year"/"Several times a month"/"Once a week"/"Daily" |
| Political knowledge | "Which is the party of the following politician?" | Choice between 7 parties and "Don't know" asked for 9 politicians; combined into index adding correct answers |
| Overconfidence | "You just answered nine knowledge questions. How many do you think you answered correctly?" | 10-point scale from 0 to 9; overconfidence computed by subtracting knowledge index from this answer |
| Education | "What is the highest educational attainment you have achieved?" | Choice between 7 options |
| Turnout | "Have you voted at the 2017 federal elections?" | Choice between "Yes"/"No"/"I was not eligible to vote"/"Don't Know" |
| Vote/Hypothetical vote | "Which party have you voted for with your main vote ("Zweitstimme")?" If respondent had not voted or did not know: "Which party would you have voted with your main vote ("Zweitstimme")?" | Choice between six main parties/"Other party"/"Don't Know" |
| Income | What is the complete net income of your household? | Choice between 11 income brackets, from "0 - 500 Euro" to "5001 Euro or more"


Table: Measures: Wording and coding of survey questions\label{tab:tab-A1}


\normalsize

\linespread{1.6}

\clearpage

## News reports used in the survey experiment ###

Below, we provide the texts of the five experimental reports in the original German, and a translated English version. Reports 1 and 5 were true but slightly shortened reports and not manipulated, so the only text variation is the reference to the source. Reports 2--4 were manipulated both with regard to their content and the source. Differences are indicated in square brackets, with pro-migration contents first and anti-migration contents second.

### Report 1 

#### Original German version

*Headline:* Weniger Anträge als erwartet

*Teaser (in Screenshot only):* Die Frage des Familiennachzugs hatte die Koalitionsverhandlungen schwer belastet. Nun zeigt sich: Nur wenige Flüchtlinge wollen ihre Angehörigen nachholen.

*Text:* Flüchtlinge mit eingeschränktem Schutz wollen viel weniger Familienangehörige nach Deutschland nachziehen lassen als vielfach prognostiziert. Von Anfang August bis zum 5. November 2018 sind lediglich 786 Visa für den Familiennachzug zu subsidiär Schutzberechtigten erteilt worden. Das geht aus einem Vermerk der Bundesregierung hervor, der [SOURCE] vorliegt. Von August bis Jahresende sind maximal 5000 Visa möglich.

Die Frage des Familiennachzugs zu den etwa 300.000 Flüchtlingen, die mit sogenanntem subsidiären Schutzstatus in Deutschland leben, hatte zu Jahresbeginn die Koalitionsverhandlungen zwischen Union und SPD schwer belastet. Politiker hatten gewarnt, dass bis zu 300.000 Angehörige nach Deutschland kommen wollten, sollte der damals geltende Stopp des Familiennachzugs auslaufen.

Die Koalition einigte sich auf den Kompromiss, monatlich bis zu 1000 Visa für enge Angehörige - also Ehepartner, minderjährige Kinder oder Eltern hier lebender Minderjähriger - zu vergeben. Doch diese Zahl wird auch nach Einschätzung des Innenministeriums kaum erreicht werden, wie [SOURCE] aus Ministeriumskreisen erfahren hat.

#### Translated English version

*Headline:* Fewer applications than expected

*Teaser (in screenshot only):* The issue of family reunification had put a heavy strain on the coalition negotiations. Now it is clear that only a few refugees want to bring their relatives to join them.

*Text:* Refugees with limited protection want much fewer family members to follow them to Germany than many had predicted. Between the beginning of August and November 5, 2018, only 786 visas have been issued for family reunification with persons granted subsidiary protection. This is the result of a statement issued by the federal government, which has been published in [SOURCE]. From August to the end of the year, a maximum of 5,000 visas are possible.

At the beginning of the year, the issue of family reunification with the approximately 300,000 refugees who are living in Germany with so-called subsidiary protection status had put a heavy strain on the coalition negotiations between the CDU/CSU and the SPD. Politicians had warned that up to 300,000 relatives would want to come to Germany if the stop on family reunification, which was in effect at the time, expired.

The coalition agreed on a compromise to issue up to 1000 visas per month for close relatives - i.e. spouses, under-age children or parents of minors living here. But even according to estimates by the interior ministry this number will hardly be reached, as [SOURCE] has learned from ministry circles.

### Report 2

#### Original German version

*Headline:* Zuwanderer unter Tatverdächtigen [nicht überrepräsentiert / überrepräsentiert]

*Teaser (in Screenshot only):* Die Zahlen der neuesten Polizeistatistik zeigen: Zuwanderer sind in Deutschland nicht [überdurchschnittlich / überdurchschnittlich] oft tatverdächtig.

*Text:* Im vergangenen Jahr waren [1,9 / 15,1]  Prozent aller Straftatverdächtigen in Deutschland Zuwanderinnen und Zuwanderer. Insgesamt wurden 1,97 Millionen Verdächtige einer Straftat registriert, davon waren [37.233 / 297.273] Zuwanderer. Das geht aus der Polizeilichen Kriminalstatistik (PKS) für 2017 hervor, die [SOURCE] vorab analysiert hat.  

Als Zuwanderer erfasst die PKS 2017 Asylbewerber, Geduldete, "unerlaubt Aufhältige", subsidiär Geschützte sowie die wenigen Kontingentflüchtlinge, die über internationale Hilfsprogramme in Deutschland Aufnahme fanden. Der so definierte Personenkreis hat Ende 2017 etwa zwei Prozent der Bevölkerung ausgemacht. Es gab vergangenes Jahr also [etwa so viele / mehr] Verdachtsfälle gegen Zuwanderer, [wie / als] es ihrem Anteil an der Gesamtbevölkerung entspricht. Bei den Berechnungen wurden jeweils ausländerrechtliche Straftaten wie etwa ein illegaler Aufenthalt ausgeklammert, da diese nicht von Deutschen begangen werden können. 

Die Informationen bestätigen einen Trend, über den [SOURCE] bereits im Vorjahr berichtet hatte. Demnach sind Zuwanderer [nicht überdurchschnittlich / überdurchschnittlich] oft an der gesamten registrierten Kriminalität beteiligt.

#### Translated English version

*Headline:* Immigrant [not overrepresented / overrepresented] among criminal suspects

*Teaser (in screenshot only):* The figures from the latest police statistics show: Immigrants [are / are not] suspected of a crime in Germany above average.

*Text:* Last year, [1.9 / 15.1] percent of all suspected criminals in Germany were immigrants. A total of 1.97 million suspects of a crime were registered, of which [37,233 / 297,273] were immigrants. This is the result of the police crime statistics (PKS) for 2017, which [SOURCE] has analyzed in advance.  

The 2017 statistics include asylum seekers, tolerated persons, "unauthorised residents", people granted subsidiary protection and the few contingent refugees who have been accepted in Germany through international aid programmes. At the end of 2017, the group of people defined in this way made up about two percent of the population. So last year there were [about as many / more] suspected cases against immigrants [as / than] corresponds to their share of the total population. In the calculations, criminal offences under aliens law, such as illegal residence, were excluded, as these cannot be committed by Germans. 

The information confirms a trend that [SOURCE] had already reported in the previous year. Accordingly, immigrants [are / are not] involved in the total amount of registered crime above average.

### Report 3

#### Original German version

*Headline:* Mehrheit von Flüchtlingen schafft den [Deutschtest / Deutschtest nicht]

*Teaser (in Screenshot only):* Seit zwölf Jahren gibt es Integrationskurse für Flüchtlinge – [mit / ohne] Erfolg, wie die Abschlusszahlen des letzten Jahres zeigen.

*Text:* Seit mehr als zwölf Jahren lernen Zuwanderer Deutsch in den Integrationskursen des Bundesamts für Migration und Flüchtlinge (BAMF). Nach [anfänglichen Schwierigkeiten sind die Kurse inzwischen ein Erfolg / anfänglichem Erfolg stecken die Kurse inzwischen in Schwierigkeiten]: Eine Auswertung der Statistiken durch [SOURCE] zeigt, dass die Abschlussquoten deutlich [gestiegen / gesunken] sind.

So nahmen 2017 genau 376.468 Menschen erstmals an einem Integrationskurs teil. In diesem Zeitraum haben [337.504 / 111.623] Teilnehmer den Integrationskurs erfolgreich absolviert, also circa [90 / 30] Prozent. Die Erfolgsquote ist damit wesentlich [höher / geringer] als noch vor wenigen Jahren: Für den Zeitraum von 2005 bis 2015 ergibt die Analyse durch [SOURCE] einen Wert von knapp 1,1 Millionen Absolventen bei über 1,8 Millionen Teilnehmern, also einer Abschlussquote von etwa 60 Prozent. 

Der [Erfolg / Misserfolg] spiegelt sich auch in den Prüfungsergebnissen: Von denjenigen, die 2017 am Sprachtest zum Kursende teilnahmen, schaffte eine [Mehrheit von 71,2% / nur eine Minderheit von 31,2%] das Sprachniveau B1, dem Maßstab für ausreichende deutsche Sprachkenntnisse laut Aufenthaltsgesetz.

#### Translated English version


*Headline*: The majority of refugees [do not pass / pass] the German test

*Teaser (in screenshot only):* Integration courses for refugees have been running for twelve years - [with / without] success, as the final figures of last year show.

*Text*: For more than twelve years, immigrants have been learning German in the integration courses of the Federal Office for Migration and Refugees (BAMF). After [initial difficulties, the courses are now a success / initial success the courses are now in difficulties]: An evaluation of the statistics by [SOURCE] shows that the completion rates have [risen / fallen] substantially.

In 2017 exactly 376,468 people took part in an integration course for the first time. During this period [337,504 / 111,623] participants successfully completed the integration course, i.e. about [90 / 30] per cent. The success rate is thus considerably [higher / lower] than just a few years ago: For the period from 2005 to 2015, the analysis by [SOURCE] shows a figure of just under 1.1 million graduates with over 1.8 million participants, i.e. a completion rate of around 60 percent.

The [success / failure] is also reflected in the examination results: Of those who took the language test at the end of the course in 2017, a [majority of 71.2% / only a minority of 31.2%] achieved language level B1, the benchmark for sufficient German language skills according to the Residence Act.

### Report 4

#### Original German version

*Headline:* Es gibt [keine / eine] Sogwirkung der Seenotrettung  

*Teaser (in Screenshot only):* Die Flüchtlingszahlen im Mittelmeer der vergangenen Jahre lassen den Schluss zu, dass die private Seenotrettung [keine / eine] Sogwirkung entfaltet.

*Text:* Die private Seenotrettung von Migranten zwischen Italien und Nordafrika wurde im Sommer 2018 durch die italienische Regierung so gut wie gestoppt – doch die Diskussion über die Einsätze der Retter tobt weiter. Kritiker der NGOs glauben, dass erst die Anwesenheit der Schiffe Migranten dazu bewegt, den lebensgefährlichen Trip zu wagen. Die Helfer selbst und ihre Unterstützer wollen von einer Sogwirkung ihres Handelns nichts wissen. Die Redaktion der [SOURCE] hat die Zahlen des UN-Flüchtlingshilfswerks zu den Ankünften nach Italien analysiert. 

Insgesamt [entkräften / bestärken] die Daten den Verdacht, dass die Rettungsboote die Migration angekurbelt haben. Ende 2013 startete mit Mare Nostrum die erste große staatliche Rettungsmission von italienischer Seite. Im ersten Jahr der Mission [sank / stieg] die Anzahl der ankommenden Menschen gegenüber dem Vorjahr auf [85.050 / 170.000]. Die Italiener stoppten Ende 2014 das Programm. Im Jahr 2015 wurde Mare Nostrum durch die militärische EU-Mission Triton ersetzt, die mit deutlich weniger Schiffen ausgestattet war. 
[Trotzdem / In Folge dessen] kamen mit [153.842 / 76.921] wieder [mehr / weniger] Bootsflüchtlinge nach Italien.

Schließlich fanden sich 2016 immer mehr private Retter im Mittelmeer ein. Zur Hochzeit ihrer Aktivität [verringerte / vergrößerte] sich die Anzahl der Ankünfte, auf [59.685 / 119.369] im Jahr 2017. [Obwohl / Seit] die italienische Regierung vermehrt gegen die NGOs vorgeht, gab es 2018 wieder etwas [mehr / weniger] Ankünfte. Die von [SOURCE] analysierten Daten legen also [nicht nahe / nahe], dass die Anwesenheit der Retter eine Sogwirkung entfaltet.

#### Translated English version

*Headline:* There [is no / is a] pull effect of sea rescue  

*Teaser (in screenshot only):* The number of refugees in the Mediterranean in recent years leads to the conclusion that private sea rescue is [not having / having a] pull effect.

*Text:* The private sea rescue of migrants between Italy and North Africa was almost stopped by the Italian government in the summer of 2018 - but the discussion about the rescue operations continues to rage. Critics of the NGOs believe that the presence of the ships alone will motivate migrants to make the life-threatening trip. The rescuers themselves and their supporters deny any pull effect of their actions. Reporters of [SOURCE] have analyzed the figures of the UN refugee relief organisation on the arrivals in Italy. 

Overall, the data [refute / confirm] the suspicion that the lifeboats have stimulated migration. At the end of 2013 Mare Nostrum was the first major government rescue mission from Italy. In the first year of the mission the number of arrivals [sank / rose] to [85,050 / 170,000] compared to the previous year. The Italians stopped the programme at the end of 2014. In 2015 Mare Nostrum was replaced by the EU military mission Triton, which was equipped with significantly fewer ships. [Nevertheless / as a consequence] [more / less] boat refugees came to Italy again with [153.842 / 76.921].

Finally, more and more private rescuers arrived in the Mediterranean in 2016. At the peak of their activity, the number of arrivals [decreased / increased] to [59,685 / 119,369] in 2017. [Although / since] the Italian government is increasingly taking action against the NGOs, there were again [some more / fewer] arrivals in 2018. The data analyzed by [SOURCE] thus [does not suggest / suggest] that the presence of the rescuers has a pull effect.

### Report 5 

#### Original German version

*Headline:* Weniger Flüchtlinge nutzen Ausreiseförderung

*Teaser (in Screenshot only):* Im vergangenen Jahr haben weniger Flüchtlinge das Förderprogramm der Bundesregierung zur freiwilligen Rückkehr angenommen.

*Text:* Die Zahl der Flüchtlinge, die über ein Förderprogramm der Bundesregierung freiwillig in ihre Heimat zurückkehren, ist in den vergangenen drei Jahren deutlich gesunken. Von Januar bis Ende November 2018 haben 15.089 Menschen das Angebot in Anspruch genommen, wie eine Auswertung der Statistiken durch [SOURCE] ergab.

Für das gesamte Jahr 2017 ergibt die Auswertung eine Zahl von 29.522 freiwilligen Rückkehrern, 2016 waren es noch 54.006. "Die - im Vergleich relativ hohe - Zahl der bewilligten freiwilligen Ausreisen des Jahres 2016 ist im Zusammenhang mit den in diesem Zeitraum historisch hohen Zugangszahlen von in Deutschland schutzsuchenden Menschen zu sehen", erklärte eine Sprecherin des BAMF gegenüber [SOURCE].

Die Zahlen der Behörde beziehen sich ausschließlich auf die von Bund und Ländern angebotenen Programme für Rückkehrer. Dabei werden die Reisekosten von Rückkehrwilligen übernommen und je nach Fall auch eine weitere Reisebeihilfe und ein Startgeld für das neue Leben im Heimatland.

#### Translated English version

*Headline*: Fewer refugees use return funds

*Teaser (in screenshot only):* In the past year, fewer refugees have accepted the federal government's support program for voluntary return.

*Text:* The number of refugees who voluntarily return to their home country through a federal government support program has decreased significantly in the past three years. From January to the end of November 2018, 15,089 people have taken advantage of the offer, according to an evaluation of the statistics by [SOURCE].

For the entire year 2017, the evaluation reveals a number of 29,522 voluntary returnees, compared to 54,006 in 2016. "The - relatively high - number of voluntary departures granted in 2016 must be seen in connection with the historically high number of people seeking protection in Germany during this period," a spokeswoman of the BAMF told [SOURCE].

The authority's figures refer exclusively to the programmes for returnees offered by the Federal Government and the Länder. In these programmes, the travel costs of returnees are paid for and, depending on the case, also a further travel allowance and an entry fee for the new life in the home country.

\newpage

## Descriptive graphs and summary statistics {#sec:summarystats}

```{r fig-A1, echo=FALSE, fig.cap="Distributions: gender, age, income and migration attitude index\\label{fig-A1}", fig.height=6, fig.width=7, message=FALSE, warning=FALSE, out.extra=''}

# Gender

p1 <- ggplot(data, aes(sex_fac)) +
  geom_bar(fill = "gray") +
  labs(x = "Gender", y = "Frequency") +
  theme_light() +
  scale_x_discrete(labels = c("male", "female"))

# Age

p2 <- ggplot(data, aes(x = age)) +
  geom_histogram(binwidth = 5, fill = "gray") +
  labs(x = "Age", y = "Frequency") +
  theme_light()

# Personal income

p3 <- ggplot(data, aes(data$income_fac)) +
  geom_bar(fill = "gray") +
  labs(x = "Income", y = "Frequency") +
  theme_light() +
  theme(
    axis.text.x = element_text(
      angle = 35, hjust = 1, vjust = 1,
      margin = margin(0.2, 0, 0.3, 0, "cm")
    ),
    plot.title = element_text(hjust = 0.5),
    plot.margin = margin(0.5, 0.5, 0, 0.5, "cm")
  )

# Migration attitudes

p4 <- ggplot(data, aes(x = migration_attitude_average)) +
  geom_histogram(binwidth = 1, fill = "gray") +
  theme_light() +
  labs(x = "Migration attitude index", y = "Frequency") +
  geom_vline(aes(xintercept = cutoffs$mean),
    colour = "red", show.legend = FALSE
  ) +
  geom_vline(aes(xintercept = cutoffs$lower),
    colour = "blue", show.legend = FALSE
  ) +
  geom_vline(aes(xintercept = cutoffs$upper),
    colour = "blue", show.legend = FALSE
  )




# pdf("output/fig-A1.pdf", width = 7, height = 6)
# plots_combined <- grid.arrange(p1, p2, p3, p4, ncol = 2)
# plots_combined
# dev.off()



plots_combined <- ggarrange(p1, p2, p3, p4,
  common.legend = TRUE,
  legend = "bottom",
  ncol = 2,
  nrow = 2
)


annotate_figure(plots_combined,
  bottom = text_grob("Note: Plot 4 displays the distribution of the individual average across the scales measuring attitudes towards\nimmigration. Based on these attitudes, we constructed two treatment variables in combination with the content\ntreatment (pro- and anti-migration).",
    color = "black",
    hjust = 0, x = 0, size = 10
  )
)


```




Table \@ref(tab:tab-A2), \@ref(tab:tab-A3), and \@ref(tab:tab-A4) display summary statistics for all variables used in the analysis (and some more). 

\linespread{1}

```{r tab-A2-tab-A3, echo=FALSE, message=FALSE, warning=FALSE, paged.print=FALSE, results="asis"}

summary.stats <- data %>%
  dplyr::select(
    age,
    sex_num,
    contains("_num")
  ) %>%
  select(
    contains("know_"),
    contains("read_"),
    contains("trust_")
  ) %>%
  as.data.frame() %>%
  rename_(.dots = setNames(names(.), gsub("\\_num", "", names(.)))) %>%
  rename_(.dots = setNames(names(.), gsub("_", " ", names(.))))

names(summary.stats) <- str_to_title(names(summary.stats))

stargazer::stargazer(summary.stats,
  summary = TRUE,
  type = "latex",
  label = "tab:tab-A2",
  font.size = "scriptsize",
  table.placement = "H",
  column.sep.width = "1pt",
  title = "Summary statistics: Numeric variables",
  digits = 2,
  rownames = FALSE,
  header = FALSE,
  notes.append = FALSE,
  notes.align = "l",
  omit.summary.stat = c("p25", "p75")
)

summary.stats <- data %>%
  dplyr::select(
    age,
    sex_num,
    treatment_source,
    treatment_channel,
    treatment_content,
    treatment_congruence50,
    treatment_congruence_30,
    contains("_num")
  ) %>%
  select(
    age,
    sex_num,
    treatment_source,
    treatment_channel,
    treatment_content,
    treatment_congruence50,
    treatment_congruence_30,
    contains("belief_report"),
    contains("share_report"),
    matches("immigrant.*num"),
    matches("migration_attitude"),
    matches("sharing.*num")
  ) %>%
  select(-contains("prompt")) %>%
  as.data.frame() %>%
  rename_(.dots = setNames(names(.), gsub("\\_num", "", names(.)))) %>%
  rename_(.dots = setNames(names(.), gsub("_", " ", names(.))))

names(summary.stats) <- str_to_title(names(summary.stats))

stargazer::stargazer(summary.stats,
  summary = TRUE,
  type = "latex",
  label = "tab:tab-A3",
  font.size = "scriptsize",
  table.placement = "H",
  column.sep.width = "1pt",
  title = "Summary statistics: Numeric variables",
  digits = 2,
  rownames = FALSE,
  header = FALSE,
  notes.append = FALSE,
  notes.align = "l",
  omit.summary.stat = c("p25", "p75")
)
```

```{r tab-A4, echo=FALSE, message=FALSE, warning=FALSE, paged.print=FALSE, results="asis"}

library(qwraps2)
options(qwraps2_markup = "markdown")

summary.table <- data %>%
  dplyr::select(
    .data$education,
    .data$income_fac,
    .data$account_email_fac,
    .data$account_fb_fac,
    .data$account_twitter_fac,
    .data$account_whatsapp_fac
  ) %>%
  summary_table(.)

# Modify table
a <- capture.output(print(summary.table,
  caption = "Summary statistics: Categorical variables", markup = "latex",
  rtitle = "Variable",
  cnames = c("Distribution")
))
a <- gsub("education", "Education (German school types)", a)
a <- gsub("income_fac", "Income", a)

a <- gsub("account_email_fac", "Use Email", a)
a <- gsub("account_fb_fac", "Use Facebook", a)
a <- gsub("account_twitter_fac", "Use Twitter", a)
a <- gsub("account_whatsapp_fac", "Use Whatsapp", a)

a <- gsub("\\\\hline", "", a)
a <- gsub("Unknown", "Missings", a)
a[5] <- gsub("\\|", "", a[5])
a[6] <- "\\hline\\hline"
a[8] <- "\\hline"
a[24] <- "\\hline"
a[50] <- "\\hline"
a[1] <- gsub("\\[t\\]", "\\[H\\]", a[1])

# Change size
a <- c(a[1:4], "\\scriptsize", a[5:length(a)])

cat(a)
```


\linespread{1.6}


\newpage

## Balance statistics and randomization {#sec:balance}

Table \@ref(tab:tab-A5) and \@ref(tab:tab-A6) provide balance statistics for content and source treatment. Table \@ref(tab:tab-A7) provides statistics across the 2*2 treatment groups. Significance tests of group differences are done with t-tests/anova for continuous variables and chi-squared tests for categorical outcomes. None of the differences is statistically significant at a 95% level.

```{r tab-A5, echo=FALSE, message=FALSE, warning=FALSE, results="asis"}

by.group <- data %>%
  select(treatment_source, sex_num, age, migration_attitude_average, vote_choice_cdu_csu) %>%
  pivot_longer(
    cols = c(sex_num, age, migration_attitude_average, vote_choice_cdu_csu),
    names_to = "variable", values_to = "value"
  ) %>%
  group_by(treatment_source, variable) %>%
  summarise(mean = mean(value, na.rm = TRUE)) %>%
  pivot_wider(names_from = "treatment_source", values_from = c("mean")) %>%
  mutate(
    variable =
      case_when(
        variable == "age" ~ "Age (mean)",
        variable == "migration_attitude_average" ~ "Migration attitude (mean)",
        variable == "sex_num" ~ "Sex (proportion)",
        variable == "vote_choice_cdu_csu" ~ "Vote CDU/CSU (proportion)"
      )
  )
by.group$"0" <- round(by.group$"0", 2)
by.group$"1" <- round(by.group$"1", 2)


by.group[1, "p-value"] <-
  round(t.test(age ~ treatment_source, data = data)$p.value, 2)
by.group[2, "p-value"] <-
  round(t.test(migration_attitude_average ~ treatment_source, data = data)$p.value, 2)
by.group[3, "p-value"] <-
  round(t.test(sex_num ~ treatment_source, data = data)$p.value, 2)
by.group[4, "p-value"] <-
  round(chisq.test(table(data$vote_choice_all, data$treatment_source))$p.value, 2)

knitr::kable(by.group,
  caption = "Balance statistics for source treatment",
  format = "latex", booktabs = T, longtable = T, escape = F,
  col.names = c("Variable", "Nachrichten 360", "Tagesschau", "Diff. p-value"),
  linesep = ""
) %>%
  add_header_above(c(" " = 1, "Source treatment" = 2)) %>%
  kable_styling(
    full_width = T,
    latex_options = c("striped", "scale_down", "HOLD_position"),
    font_size = 9
  )
```

```{r tab-A6, echo=FALSE, message=FALSE, warning=FALSE, results="asis"}

by.group <- data %>%
  select(
    treatment_content, sex_num, age,
    migration_attitude_average, vote_choice_cdu_csu
  ) %>%
  pivot_longer(
    cols = c(sex_num, age, migration_attitude_average, vote_choice_cdu_csu),
    names_to = "variable", values_to = "value"
  ) %>%
  group_by(treatment_content, variable) %>%
  summarise(mean = mean(value, na.rm = TRUE)) %>%
  pivot_wider(names_from = "treatment_content", values_from = c("mean")) %>%
  mutate(
    variable =
      case_when(
        variable == "age" ~ "Age (mean)",
        variable == "migration_attitude_average" ~ "Migration attitude (mean)",
        variable == "sex_num" ~ "Sex (proportion)",
        variable == "vote_choice_cdu_csu" ~ "Vote CDU/CSU (proportion)"
      )
  )
by.group$"0" <- round(by.group$"0", 2)
by.group$"1" <- round(by.group$"1", 2)

by.group[1, "p-value"] <-
  round(t.test(age ~ treatment_content, data = data)$p.value, 2)
by.group[2, "p-value"] <-
  round(t.test(migration_attitude_average ~ treatment_content, data = data)$p.value, 2)
by.group[3, "p-value"] <-
  round(t.test(sex_num ~ treatment_content, data = data)$p.value, 2)
by.group[4, "p-value"] <-
  round(chisq.test(table(data$vote_choice_all, data$treatment_content))$p.value, 2)

knitr::kable(by.group,
  caption = "Balance statistics for content treatment",
  format = "latex", booktabs = T,
  col.names = c("Variable", "Pro-migration", "Anti-migration", "Diff. p-value"),
  linesep = ""
) %>%
  add_header_above(c(" " = 1, "Content treatment" = 2)) %>%
  kable_styling(
    full_width = T,
    latex_options = c("striped", "scale_down", "HOLD_position"),
    font_size = 9
  )
```

```{r tab-A7, echo=FALSE, message=FALSE, warning=FALSE, results="asis"}

by.group <- data %>%
  select(
    treatment_content, treatment_source, sex_num, age,
    migration_attitude_average, vote_choice_cdu_csu
  ) %>%
  pivot_longer(
    cols = c(sex_num, age, migration_attitude_average, vote_choice_cdu_csu),
    names_to = "variable", values_to = "value"
  ) %>%
  group_by(treatment_content, treatment_source, variable) %>%
  summarise(mean = mean(value, na.rm = TRUE)) %>%
  pivot_wider(
    names_from = c("treatment_content", "treatment_source"),
    values_from = c("mean")
  ) %>%
  mutate_if(is.numeric, round, 2) %>%
  mutate(
    variable =
      case_when(
        variable == "age" ~ "Age (mean)",
        variable == "migration_attitude_average" ~ "Migration attitude (mean)",
        variable == "sex_num" ~ "Sex (proportion)",
        variable == "vote_choice_cdu_csu" ~ "Vote CDU/CSU (proportion)"
      )
  )

by.group[1, "p-value"] <-
  round(summary(aov(age ~ treatment_combination, data = data))[[1]][1, "Pr(>F)"], 2)
by.group[2, "p-value"] <-
  round(summary(aov(migration_attitude_average ~ treatment_combination, data = data))[[1]][1, "Pr(>F)"], 2)
by.group[3, "p-value"] <-
  round(summary(aov(sex_num ~ treatment_combination, data = data))[[1]][1, "Pr(>F)"], 2)
by.group[4, "p-value"] <-
  round(chisq.test(table(data$vote_choice_all, data$treatment_combination))$p.value, 2)

knitr::kable(by.group,
  caption = "Balance statistics for source*content treatment",
  format = "latex", booktabs = T,
  col.names = c(
    "Variable", "Pro-migration/Fake", "Pro-migration/Real",
    "Anti-migration/Fake", "Anti-migration/Real", "Diff. p-value"
  ),
  linesep = ""
) %>%
  add_header_above(c(" " = 1, "Source*Content treatment" = 4)) %>%
  kable_styling(
    full_width = T,
    latex_options = c("striped", "scale_down", "HOLD_position"),
    font_size = 9
  )
```



\linespread{1.6}
\newpage

## Representativeness: Sample and population {#sec:representativeness}
Our sample is a non-probability quota sample. Table \@ref(tab:tab-A8),  \@ref(tab:tab-A9) and \@ref(tab:tab-A10) display both population and sample statistics for the quota variables.

\linespread{1}

```{r message=FALSE, warning=FALSE, paged.print=FALSE}
age_quotas <- read_csv("input/data_age_quotas.csv")
sex_quotas <- read_csv("input/data_sex_quotas.csv")
state_quotas <- read_csv("input/data_federal_state_quotas.csv")
```

```{r tab-A8, message=FALSE, warning=FALSE, paged.print=FALSE}
# Gender

data_sex_quotas <- data.frame(
  sex = names(prop.table(table(data$sex[!is.na(data$know_tagesschau)], useNA = "ifany"))),
  data_absolute = as.numeric(table(data$sex[!is.na(data$know_tagesschau)], useNA = "ifany")),
  data_percent = as.numeric(prop.table(table(data$sex[!is.na(data$know_tagesschau)], useNA = "ifany"))) * 100
)

data_sex_quotas$sex <- recode(data_sex_quotas$sex,
  "maennlich" = "male",
  "weiblich" = "female"
)
x <- left_join(sex_quotas, data_sex_quotas, by = "sex") %>% mutate_if(is.numeric, round, 2)

kable(x,
  caption = "Gender distribution (population and sample)",
  format = "latex", booktabs = T,
  col.names = Hmisc::capitalize(gsub("_", ": ", names(x)))
) %>%
  kable_styling(
    full_width = T,
    latex_options = c("striped", "scale_down", "HOLD_position"), font_size = 9
  )
```

```{r tab-A9, message=FALSE, warning=FALSE, paged.print=FALSE}

# Federal state

data_state_quotas <- data.frame(
  state = names(prop.table(table(data$federal_state[!is.na(data$know_tagesschau)], useNA = "ifany"))),
  data_percent = as.numeric(prop.table(table(data$federal_state[!is.na(data$know_tagesschau)], useNA = "ifany"))) * 100,
  data_absolute = as.numeric(table(data$federal_state[!is.na(data$know_tagesschau)], useNA = "ifany"))
)

x <- full_join(state_quotas, data_state_quotas, by = "state") %>%
  mutate_if(is.numeric, round, 2)

kable(x,
  caption = "Residence distribution (population and sample)",
  format = "latex", booktabs = T,
  col.names = Hmisc::capitalize(gsub("_", ": ", names(x)))
) %>%
  kable_styling(
    full_width = T,
    latex_options = c("striped", "scale_down", "HOLD_position"), font_size = 9
  )
```
  
  
```{r tab-A10, message=FALSE, warning=FALSE, paged.print=FALSE}  

# Age

data_age_quotas <- data.frame(
  age = names(prop.table(table(data$age_cat[!is.na(data$know_tagesschau)],
    useNA = "ifany"
  ))),
  data_percent = as.numeric(prop.table(table(data$age_cat[!is.na(data$know_tagesschau)], useNA = "ifany"))) * 100,
  data_absolute = as.numeric(table(data$age_cat[!is.na(data$know_tagesschau)], useNA = "ifany"))
)

x <- left_join(age_quotas, data_age_quotas, by = "age") %>%
  mutate_if(is.numeric, round, 2) %>%
  select(-population_percent_rounded)

kable(x,
  caption = "Age distribution (population and sample)",
  format = "latex", booktabs = T,
  col.names = Hmisc::capitalize(gsub("_", ": ", names(x)))
) %>%
  kable_styling(
    full_width = T,
    latex_options = c("striped", "scale_down", "HOLD_position"), font_size = 9
  )
```

\linespread{1.6}

\newpage

## Source knowledge and trust {#sec:sourceknowledgetrust}

In the first part of our study, we exposed subjects to a list of nine real or made-up news sources and asked whether they knew, had read/watched and trusted each source. Below, we report source-level averages of these outcomes. Figure \@ref(fig-A2) shows that proportion of subjects affirming they recognize a source and say that they have read or watched it. It can be seen that almost everyone knows our real source (*Tagesschau*), and only few imagine knowing the source we made up (*Nachrichten 360)*. 

```{r fig-A2, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Average knowledge of news sources\\label{fig-A2}", fig.height=3.5, out.extra = '', fig.pos= "ht"}

# Replace missing values on read/watched variable with 0 if they do not recognize/know that source
## Question is not asked of those who indicate that they don't recognize this source
## Assumption: if they have 0 on 'know' it also means that they haven't read/watched it

familiarity_averages <- data

familiarity_averages$read_tagesschau_num[familiarity_averages$know_tagesschau_num == 0] <- 0
familiarity_averages$read_heute_num[familiarity_averages$know_heute_num == 0] <- 0
familiarity_averages$read_sz_num[familiarity_averages$know_sz_num == 0] <- 0
familiarity_averages$read_faz_num[familiarity_averages$know_faz_num == 0] <- 0
familiarity_averages$read_focus_num[familiarity_averages$know_focus_num == 0] <- 0
familiarity_averages$read_bild_num[familiarity_averages$know_bild_num == 0] <- 0
familiarity_averages$read_nachrichten360_num[familiarity_averages$know_nachrichten360_num == 0] <- 0
familiarity_averages$read_berliner_num[familiarity_averages$know_berliner_num == 0] <- 0
familiarity_averages$read_spiegel_num[familiarity_averages$know_spiegel_num == 0] <- 0
familiarity_averages$read_rtdeutsch_num[familiarity_averages$know_rtdeutsch_num == 0] <- 0
familiarity_averages$read_newsblitz_num[familiarity_averages$know_newsblitz_num == 0] <- 0

familiarity_averages <- familiarity_averages %>%
  select(
    "know_tagesschau_num", "know_heute_num", "know_sz_num",
    "know_faz_num", "know_focus_num", "know_bild_num",
    "know_nachrichten360_num", "know_berliner_num", "know_spiegel_num",
    "know_rtdeutsch_num", "know_newsblitz_num",
    "read_tagesschau_num", "read_heute_num", "read_sz_num",
    "read_faz_num", "read_focus_num", "read_bild_num",
    "read_nachrichten360_num", "read_berliner_num", "read_spiegel_num",
    "read_rtdeutsch_num", "read_newsblitz_num"
  ) %>% # only keep variabel of interest
  gather(key = "variable", value = "value", know_tagesschau_num:read_newsblitz_num) %>%
  group_by(variable) %>%
  summarize(
    N = n(),
    mean = mean(value, na.rm = TRUE),
    var = var(value, na.rm = TRUE),
    sd = sd(value, na.rm = TRUE),
    na_sum = sum(is.na(value))
  ) %>%
  mutate(n = N - na_sum) %>%
  mutate(
    se = sqrt(var / n),
    ci.error = qt(0.975, df = n - 1) * sd / sqrt(n),
    conf.low = mean - ci.error,
    conf.high = mean + ci.error
  )

familiarity_averages$key <- dplyr::recode(familiarity_averages$variable,
  know_tagesschau_num = "Tagesschau",
  know_heute_num = "Heute",
  know_sz_num = "SZ",
  know_faz_num = "FAZ",
  know_focus_num = "Focus",
  know_bild_num = "Bild",
  know_nachrichten360_num = "Nachrichten 360",
  know_berliner_num = "Berliner Nachrichten",
  know_spiegel_num = "Spiegel",
  know_rtdeutsch_num = "RTDeutsch",
  know_newsblitz_num = "Newsblitz",
  read_tagesschau_num = "Tagesschau",
  read_heute_num = "Heute",
  read_sz_num = "SZ",
  read_faz_num = "FAZ",
  read_focus_num = "Focus",
  read_bild_num = "Bild",
  read_nachrichten360_num = "Nachrichten 360",
  read_berliner_num = "Berliner Nachrichten",
  read_spiegel_num = "Spiegel",
  read_rtdeutsch_num = "RTDeutsch",
  read_newsblitz_num = "Newsblitz"
)

familiarity_averages$Type <- dplyr::recode(familiarity_averages$variable,
  know_tagesschau_num = "Mainstream",
  know_heute_num = "Mainstream",
  know_sz_num = "Mainstream",
  know_faz_num = "Mainstream",
  know_focus_num = "Mainstream",
  know_bild_num = "Mainstream",
  know_nachrichten360_num = "Made-up",
  know_berliner_num = "Made-up",
  know_spiegel_num = "Mainstream",
  know_rtdeutsch_num = "Hyperpartisan",
  know_newsblitz_num = "Hyperpartisan",
  read_tagesschau_num = "Mainstream",
  read_heute_num = "Mainstream",
  read_sz_num = "Mainstream",
  read_faz_num = "Mainstream",
  read_focus_num = "Mainstream",
  read_bild_num = "Mainstream",
  read_nachrichten360_num = "Made-up",
  read_berliner_num = "Made-up",
  read_spiegel_num = "Mainstream",
  read_rtdeutsch_num = "Hyperpartisan",
  read_newsblitz_num = "Hyperpartisan"
)

familiarity_averages <- familiarity_averages %>%
  mutate(known = as.numeric(str_detect(variable, "know"))) %>%
  mutate(known = factor(known, ordered = TRUE, levels = c("1", "0"))) %>%
  arrange(desc(known), desc(mean))

ggplot(familiarity_averages, aes(
  x = reorder(key, -mean),
  y = mean,
  group = as.factor(known),
  color = as.factor(Type),
  linetype = as.factor(known)
)) +
  geom_point(position = position_dodge(0.4), size = 0.7) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
    width = .9,
    position = position_dodge(0.4), size = 0.3
  ) +
  scale_color_manual(
    name = "Type of news source",
    values = add.alpha(c("orange", "blue", "red"), alpha = 1),
    labels = c("Hyperpartisan (real)", "Made-up", "Mainstream (real)")
  ) +
  scale_linetype_manual(
    values = c("solid", "dashed"),
    name = "Question",
    labels = c("Recognize: Yes", "Read/watch: Yes")
  ) +
  labs(x = "", y = "Proportion") +
  theme_light() +
  theme(
    axis.text.x = element_text(angle = 35, hjust = 1, vjust = 1),
    axis.title.y = element_text(size = 10),
    axis.title.x = element_text(size = 10),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 8),
    plot.title = element_text(size = 14, hjust = 0.5)
  )
```

Figure \@ref(fig-A3) shows averages of subjects responses to the question: "Even if you don't know all of them: Do you think the following media can be trusted?" [Not at all, rather not, partly, rather, completely, cf. @PennycookRand2019c], which we asked irrespective of whether people knew the source. Averages are reported separately for those who said they knew the respective source and those who said they did not. 

The two Figures taken together allow some interesting interpretation of what is going on behind the source treatment manipulation: The familiarity gap between the two experimental sources is almost maximal (see Figure \@ref(fig-A2)). The gap in credibility/trustworthiness perceptions is much smaller, as Figure \@ref(fig-A3) shows. This suggests that there is something else about the fake source (*Nachrichten 360*) that enhances its credibility. The fact that there is a substantive difference between the two made-up sources suggests that the name might make a difference. But (imagined) familiarity does play a role, since people who think they know the source also find it more credible than those who do not. 

```{r fig-A3, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Average trust of news sources\\label{fig-A3}", fig.height=3.5, out.extra = '', fig.pos= "ht"}

trust_averages <- data %>%
  select(
    "trust_tagesschau_num", "trust_heute_num", "trust_sz_num",
    "trust_faz_num", "trust_focus_num", "trust_bild_num",
    "trust_nachrichten360_num", "trust_berliner_num", "trust_spiegel_num",
    "trust_rtdeutsch_num", "trust_newsblitz_num",
    "trust_tagesschau_known_num", "trust_heute_known_num",
    "trust_sz_known_num", "trust_faz_known_num", "trust_focus_known_num",
    "trust_bild_known_num", "trust_nachrichten360_known_num",
    "trust_berliner_known_num", "trust_spiegel_known_num",
    "trust_rtdeutsch_known_num", "trust_newsblitz_known_num"
  ) %>%
  gather(
    key = "variable", value = "value",
    trust_tagesschau_num:trust_newsblitz_known_num
  ) %>%
  group_by(variable) %>%
  summarize(
    N = n(),
    mean = mean(value, na.rm = TRUE),
    var = var(value, na.rm = TRUE),
    sd = sd(value, na.rm = TRUE),
    na_sum = sum(is.na(value))
  ) %>%
  mutate(n = N - na_sum) %>%
  mutate(
    se = sqrt(var / n),
    ci.error = qt(0.975, df = n - 1) * sd / sqrt(n),
    conf.low = mean - ci.error,
    conf.high = mean + ci.error
  )

trust_averages$key <- dplyr::recode(trust_averages$variable,
  trust_tagesschau_num = "Tagesschau",
  trust_heute_num = "Heute",
  trust_sz_num = "SZ",
  trust_faz_num = "FAZ",
  trust_focus_num = "Focus",
  trust_bild_num = "Bild",
  trust_nachrichten360_num = "Nachrichten 360",
  trust_berliner_num = "Berliner Nachrichten",
  trust_spiegel_num = "Spiegel",
  trust_rtdeutsch_num = "RTDeutsch",
  trust_newsblitz_num = "Newsblitz",
  trust_tagesschau_known_num = "Tagesschau",
  trust_heute_known_num = "Heute",
  trust_sz_known_num = "SZ",
  trust_faz_known_num = "FAZ",
  trust_focus_known_num = "Focus",
  trust_bild_known_num = "Bild",
  trust_nachrichten360_known_num = "Nachrichten 360",
  trust_berliner_known_num = "Berliner Nachrichten",
  trust_spiegel_known_num = "Spiegel",
  trust_rtdeutsch_known_num = "RTDeutsch",
  trust_newsblitz_known_num = "Newsblitz"
)

trust_averages$Type <- dplyr::recode(trust_averages$variable,
  trust_tagesschau_num = "Mainstream",
  trust_heute_num = "Mainstream",
  trust_sz_num = "Mainstream",
  trust_faz_num = "Mainstream",
  trust_focus_num = "Mainstream",
  trust_bild_num = "Mainstream",
  trust_nachrichten360_num = "Made-up",
  trust_berliner_num = "Made-up",
  trust_spiegel_num = "Mainstream",
  trust_rtdeutsch_num = "Hyperpartisan",
  trust_newsblitz_num = "Hyperpartisan",
  trust_tagesschau_known_num = "Mainstream",
  trust_heute_known_num = "Mainstream",
  trust_sz_known_num = "Mainstream",
  trust_faz_known_num = "Mainstream",
  trust_focus_known_num = "Mainstream",
  trust_bild_known_num = "Mainstream",
  trust_nachrichten360_known_num = "Made-up",
  trust_berliner_known_num = "Made-up",
  trust_spiegel_known_num = "Mainstream",
  trust_rtdeutsch_known_num = "Hyperpartisan",
  trust_newsblitz_known_num = "Hyperpartisan"
)
trust_averages <- trust_averages %>%
  mutate(known = str_detect(variable, "known")) %>%
  mutate(known = ifelse(known == TRUE, 1, 0)) %>%
  mutate(known = as.factor(known)) %>%
  arrange(-desc(known), desc(mean))

ggplot(trust_averages, aes(
  x = reorder(key, -mean),
  y = mean,
  group = as.factor(known),
  color = as.factor(Type),
  shape = as.factor(known)
)) +
  geom_point(position = position_dodge(0.4), size = 1.1) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
    width = .9,
    position = position_dodge(0.4), size = 0.3
  ) +
  scale_color_manual(
    name = "Type of news source",
    values = add.alpha(c("orange", "blue", "red"), alpha = 1),
    labels = c("Hyperpartisan (real)", "Made-up", "Mainstream (real)")
  ) +
  scale_shape_manual(
    name = "Known",
    values = c(16, 17),
    labels = c("No", "Yes")
  ) +
  ylim(0, 4) +
  labs(x = "", y = "Average trust score") +
  theme_light() +
  theme(
    axis.text.x = element_text(angle = 35, hjust = 1, vjust = 1),
    axis.title.y = element_text(size = 10),
    axis.title.x = element_text(size = 10),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 8),
    plot.title = element_text(size = 14, hjust = 0.5)
  )
```

\newpage


## Believing and sharing by partisanship {#sec:robust-partisansharing} 

Figure \@ref(fig-A4) and \@ref(fig-A5) plot belief and sharing intentions against vote intention to explore whether there might be any patterns across partisans that might qualify our results.

```{r fig-A4, fig.cap="Belief by vote intention\\label{fig-A4}", message=FALSE, warning=FALSE, paged.print=FALSE, fig.height=3, fig.width=6, out.extra = '', results = 'hide'}

belief_party <- data %>%
  select(
    belief_report_1_num,
    belief_report_5_num,
    vote_choice_all_fac
  ) %>%
  filter(!is.na(vote_choice_all_fac)) %>%
  group_by(vote_choice_all_fac) %>%
  get_summary_stats(type = "mean_se") %>%
  mutate(
    ymin = mean - 1.96 * se,
    ymax = mean + 1.96 * se
  ) %>%
  mutate(variable = gsub(
    "Fb", "Facebook",
    str_to_title(gsub("_", " ", gsub("belief_|_num", "", variable)))
  ))

ggplot(
  belief_party,
  aes(
    x = vote_choice_all_fac,
    y = mean
  )
) +
  geom_point(size = 1.1) +
  geom_errorbar(aes(ymin = ymin, ymax = ymax),
    width = .3,
    size = 0.7
  ) +
  theme_light() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  facet_wrap(~variable,
    nrow = 1,
    ncol = 2,
    labeller = label_value
  ) +
  labs(
    x = "Vote intention",
    y = "Belief"
  ) +
  ylim(c(0, 6))
```


```{r fig-A5, fig.cap="Sharing intention by vote intention\\label{fig-A5}", message=FALSE, warning=FALSE, paged.print=FALSE, fig.height=4, out.extra = '', results = 'hide'}

sharing_party <- data %>%
  select(
    share_report_1_email_num,
    share_report_1_fb_num,
    share_report_1_twitter_num,
    share_report_1_whatsapp_num,
    share_report_5_email_num,
    share_report_5_fb_num,
    share_report_5_twitter_num,
    share_report_5_whatsapp_num,
    vote_choice_all_fac
  ) %>%
  filter(!is.na(vote_choice_all_fac)) %>%
  group_by(vote_choice_all_fac) %>%
  get_summary_stats(type = "mean_se") %>%
  mutate(
    ymin = mean - 1.96 * se,
    ymax = mean + 1.96 * se
  ) %>%
  mutate(
    variable =
      gsub(
        "Fb", "Facebook",
        str_to_title(gsub("_", " ", gsub("share_|_num", "", variable)))
      )
  )

# Replace negative confidence values
sharing_party <- sharing_party %>%
  mutate(ymin = case_when(
    ymin < 0 ~ 0,
    TRUE ~ as.numeric(as.character(.$ymin))
  ))

ggplot(
  sharing_party, # Filter!
  aes(
    x = vote_choice_all_fac,
    y = mean
  )
) +
  geom_point(size = 1.1) +
  geom_errorbar(aes(ymin = ymin, ymax = ymax),
    width = .3,
    size = 0.7
  ) +
  theme_light() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  facet_wrap(~variable,
    nrow = 2,
    ncol = 4,
    labeller = label_value
  ) +
  labs(
    x = "Vote intention (neg. CI values replaced with 0)",
    y = "Sharing intention"
  ) +
  ylim(c(0, 1))
```


\pagebreak


## Believing and sharing

Figure \@ref(fig-A6) visualizes two correlation matrices indicating the correlations between the belief and the sharing measures for Report 1 and 5 respectively (Pearson correlations). First, the correlation between our belief measure (Scale: 0-6) and our sharing measures (Scale: 0,1) seems quite low, especially given that the latter are asked directly after our belief measure. Second, the correlations between the different sharing measures are higher, however, also not particularly high.

```{r fig-A6, echo=FALSE, fig.align="center", fig.cap="Correlation plot: Outcomes for Report 1 and 5\\label{fig-A6}", fig.height=4, message=FALSE, warning=FALSE, paged.print=FALSE}

data_cor <- data %>%
  select(
    belief_report_1_num,
    share_report_1_email_num,
    share_report_1_fb_num,
    share_report_1_twitter_num,
    share_report_1_whatsapp_num
  )

names(data_cor) <- gsub(
  "Fb", "Facebook",
  str_to_title(gsub("_", " ", gsub("_num", "", names(data_cor))))
)
names(data_cor) <- str_replace_all(names(data_cor), "1 ", "1\n")

correlation_matrix <- round(cor(data_cor, use = "pairwise.complete.obs"), 1)

# Plot

p1 <- ggcorrplot(correlation_matrix,
  hc.order = TRUE,
  type = "lower",
  lab = TRUE,
  tl.cex = 9,
  show.diag = TRUE
)

data_cor <- data %>%
  select(
    belief_report_5_num,
    share_report_5_email_num,
    share_report_5_fb_num,
    share_report_5_twitter_num,
    share_report_5_whatsapp_num
  )
names(data_cor) <- gsub(
  "Fb", "Facebook",
  str_to_title(gsub("_", " ", gsub("_num", "", names(data_cor))))
)

names(data_cor) <- str_replace_all(names(data_cor), "5 ", "5\n")

correlation_matrix <- round(cor(data_cor, use = "pairwise.complete.obs"), 1)

# Plot

p2 <- ggcorrplot(correlation_matrix,
  hc.order = TRUE,
  type = "lower",
  lab = TRUE,
  tl.cex = 9,
  show.diag = TRUE
)

graph_combined <- ggarrange(p1, p2,
  common.legend = TRUE,
  legend = "bottom",
  ncol = 2
)
graph_combined
```


\newpage

## Models {#sec:models}

\linespread{1}

```{r tab-A11, echo=FALSE, paged.print=FALSE, results="asis"}

stargazer(h1a_lm, h1b_email_lm, h1b_fb_lm, h1b_twitter_lm, h1b_whatsapp_lm,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A11",
  table.placement = "!ht",
  column.sep.width = "1pt",
  title = "Linear regression: Source",
  align = TRUE,
  dep.var.caption = "Dependent variables for Report 1",
  dep.var.labels = c(
    "Belief", "Share Email",
    "Share Facebook", "Share Twitter", "Share Whatsapp"
  ),
  covariate.labels = c("Treat.: Source"),
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE,
  notes = c("*p<0.05; **p<0.01; ***p<0.001;", "Belief measured on 7-point scale (0-6) and sharing intention as a binary choice (0-1)."),
  header = FALSE,
  font.size = "scriptsize"
)
```


```{r tab-A12, echo=FALSE, paged.print=FALSE, results="asis"}

stargazer(h2a_lm, h2b_email_lm, h2b_fb_lm, h2b_twitter_lm, h2b_whatsapp_lm,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A12",
  table.placement = "!ht",
  column.sep.width = "1pt",
  title = "Linear regression: Congruence",
  align = TRUE,
  dep.var.caption = "Dependent variables for Report 5",
  dep.var.labels = c(
    "Belief", "Share Email",
    "Share Facebook", "Share Twitter", "Share Whatsapp"
  ),
  covariate.labels = c("Treat.: Congruence"),
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE,
  notes = c("*p<0.05; **p<0.01; ***p<0.001;", "Belief measured on 7-point scale (0-6) and sharing intention as a binary choice (0-1)."),
  header = FALSE,
  font.size = "scriptsize"
)
```

```{r tab-A13, echo=FALSE, paged.print=FALSE, results="asis"}

stargazer(h3a_lm, h3b_email_lm, h3b_fb_lm, h3b_twitter_lm, h3b_whatsapp_lm,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A13",
  table.placement = "H",
  column.sep.width = "1pt",
  title = "Linear regression: Source, congruence and interaction",
  align = TRUE,
  dep.var.caption = "Dependent variables for Report 5",
  dep.var.labels = c(
    "Belief", "Share Email",
    "Share Facebook", "Share Twitter", "Share Whatsapp"
  ),
  covariate.labels = c(
    "Treat.: Source",
    "Treat.: Congruence",
    "Treat.: Source*Congruence"
  ),
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE,
  notes = c("*p<0.05; **p<0.01; ***p<0.001;", "Belief measured on 7-point scale (0-6) and sharing intention as a binary choice (0-1)."),
  header = FALSE,
  font.size = "scriptsize"
)
```



\linespread{1.6}

\newpage

## Robustness checks {#sec:robustnesschecks}

We considered the following threats to the validity of our conclusions. First, we re-estimated all results with the congruence treatment constructed from tertiles instead of the median of the composite attitude index. As Appendix \@ref(sec:robust-tertile) illustrates, results are substantially the same. Effects on belief become even more pronounced. Second, we suspected that sharing intention may be different for frequent sharers, i.e., persons who tend to share "a few times a year" or more. In Appendix \@ref(sec:robust-frequentsharers) we provide descriptives for sharing behavior and show results for the subset of frequent sharers. Our previous finding that source and congruence treatment affect the sharing propensity for Facebook are confirmed, with the effect becoming somewhat stronger. Third, experimental findings may be invalid if participants do not properly receive the treatments. In Appendix \@ref(sec:robust-manipcheck) we present descriptives regarding manipulation checks and results for the subsample of respondents who passed all four manipulation checks. Our findings remain robust, again with the evidence becoming more pronounced. Fourth, participants leaving the online questionnaire may either not be concentrated or go to verify the news reports we show them. Appendix \@ref(sec:robust-focus) provides re-estimations of our models for the subset of respondents who never left the questionnaire. Again, our findings seem robust. Only the congruence-source interaction, while substantially similar, becomes statistically insignificant. This might be due to the reduced sample size. 

### Alternative implementation of congruence treatment {#sec:robust-tertile}

\linespread{1}

```{r tab-A14, echo=FALSE, paged.print=FALSE, results="asis"}

h2a_lm_30 <- lm(belief_report_5_num ~ treatment_congruence_30, data = data)

h2b_email_lm_30 <- lm(share_report_5_email_num ~ treatment_congruence_30, data = data)

h2b_fb_lm_30 <- lm(share_report_5_fb_num ~ treatment_congruence_30, data = data)

h2b_twitter_lm_30 <- lm(share_report_5_twitter_num ~ treatment_congruence_30, data = data)

h2b_whatsapp_lm_30 <- lm(share_report_5_whatsapp_num ~ treatment_congruence_30, data = data)

stargazer(h2a_lm_30, h2b_email_lm_30, h2b_fb_lm_30, h2b_twitter_lm_30, h2b_whatsapp_lm_30,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A14",
  table.placement = "!ht",
  column.sep.width = "1pt",
  title = "Linear regression: Congruence (treatment defined by attitude tertiles)",
  align = TRUE,
  dep.var.caption = "Dependent variables for Report 5",
  dep.var.labels = c(
    "Belief", "Share Email",
    "Share Facebook", "Share Twitter", "Share Whatsapp"
  ),
  covariate.labels = c("Treat.: Congruence"),
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE,
  notes = c("*p<0.05; **p<0.01; ***p<0.001;", "Belief measured on 7-point scale (0-6) and sharing intention as a binary choice (0-1)."),
  header = FALSE,
  font.size = "scriptsize"
)
```

```{r tab-A15, echo=FALSE, paged.print=FALSE, results="asis"}

h3a_lm_30 <- lm(belief_report_5_num ~
treatment_source +
  treatment_congruence_30 +
  treatment_source * treatment_congruence_30, data = data)

h3b_email_lm_30 <- lm(share_report_5_email_num ~
treatment_source +
  treatment_congruence_30 +
  treatment_source * treatment_congruence_30, data = data)

h3b_fb_lm_30 <- lm(share_report_5_fb_num ~
treatment_source +
  treatment_congruence_30 +
  treatment_source * treatment_congruence_30, data = data)

h3b_twitter_lm_30 <- lm(share_report_5_twitter_num ~
treatment_source +
  treatment_congruence_30 +
  treatment_source * treatment_congruence_30, data = data)

h3b_whatsapp_lm_30 <- lm(share_report_5_whatsapp_num ~
treatment_source +
  treatment_congruence_30 +
  treatment_source * treatment_congruence_30, data = data)

stargazer(h3a_lm_30, h3b_email_lm_30, h3b_fb_lm_30, h3b_twitter_lm_30, h3b_whatsapp_lm_30,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A15",
  table.placement = "H",
  column.sep.width = "1pt",
  title = "Linear regression: Source, congruence and interaction (alternative implementation of congruence treatment)",
  align = TRUE,
  dep.var.caption = "Dependent variables for Report 5",
  dep.var.labels = c(
    "Belief", "Share Email",
    "Share Facebook", "Share Twitter", "Share Whatsapp"
  ),
  covariate.labels = c(
    "Treat.: Source",
    "Treat.: Congruence",
    "Treat.: Source*Congruence"
  ),
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE,
  notes = c("*p<0.05; **p<0.01; ***p<0.001;", "Belief measured on 7-point scale (0-6) and sharing intention as a binary choice (0-1)."),
  header = FALSE,
  font.size = "scriptsize"
)
```

\linespread{1.6}

### Subsample: Frequent sharers {#sec:robust-frequentsharers}

Figure \@ref(fig-A7) displays the prevalence of accounts across different platforms in our sample. As was to be expected almost everyone has an email account, followed by Whatsapp, Facebook and Twitter.

```{r fig-A7, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Prevalence of accounts across platforms\\label{fig-A7}", fig.height=3, out.extra = '', fig.pos= "ht"}

data_plot <- data %>%
  dplyr::select(
    "account_email_fac",
    "account_fb_fac",
    "account_twitter_fac",
    "account_whatsapp_fac"
  ) %>% # only keep variabel of interest
  gather(key = "variable", value = "value", account_email_fac:account_whatsapp_fac) %>%
  group_by(variable) %>%
  summarize(
    pct.Ja = mean(value == "Yes", na.rm = TRUE),
    pct.Inactive = mean(value == "Yes, but I dont use it", na.rm = TRUE),
    pct.Nein = mean(value == "No", na.rm = TRUE)
  ) %>%
  gather(key = "category", value = "value", pct.Ja:pct.Nein) %>%
  mutate(category = factor(category,
    levels = c("pct.Ja", "pct.Inactive", "pct.Nein"),
    ordered = TRUE
  )) %>%
  mutate(value = round(value, 2)) %>%
  mutate(value = 100 * value)

ggplot(data_plot, aes(x = variable, y = value)) +
  geom_bar(
    stat = "identity",
    width = 0.7,
    position = position_dodge(width = 0.8),
    aes(
      fill = factor(variable),
      alpha = factor(category)
    )
  ) +
  geom_text(
    position = position_dodge(width = 0.8),
    aes(
      alpha = factor(category),
      label = paste(value, "%", sep = "")
    ),
    vjust = 1.6,
    color = "black",
    size = 2
  ) +
  scale_fill_discrete(
    name = "Platform",
    breaks = c(
      "account_email_fac", "account_fb_fac",
      "account_twitter_fac", "account_whatsapp_fac"
    ),
    labels = c("Email", "Facebook", "Twitter", "Whatsapp")
  ) +
  scale_alpha_discrete(
    name = "Account",
    range = c(1, 0.5),
    labels = c("Yes", "Yes, but inactive", "No")
  ) +
  scale_x_discrete(labels = str_to_title(gsub(
    "fb", "facebook",
    gsub(
      "account_|_fac", "",
      unique(data_plot$variable)
    )
  ))) +
  xlab("Platforms") +
  ylab("Percentage (%)") +
  theme_light() +
  theme(
    axis.text.x = element_text(
      angle = 35, hjust = 1, vjust = 1,
      margin = margin(0.2, 0, 0.3, 0, "cm")
    ),
    plot.title = element_text(hjust = 0.5),
    plot.margin = margin(0.5, 0.5, 0, 0.5, "cm"),
    panel.grid.major.x = element_blank(),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 9)
  )
```

Figure \@ref(fig-A8) displays sharing propensity for the different reports across platforms. Sharing intention is only measured for subjects who had affirmed having an account on that platform. The graph illustrates the intention to share is highest on Facebook, followed by Twitter and Whatsapp.

```{r fig-A8, echo=FALSE, fig.cap="Sharing propensity across reports and platforms\\label{fig-A8}", fig.height=3, message=FALSE, warning=FALSE, out.extra=''}

data_plot <- data %>%
  select(
    "share_report_1_email_num",
    "share_report_1_fb_num",
    "share_report_1_twitter_num",
    "share_report_1_whatsapp_num",
    "share_report_2_email_num",
    "share_report_2_fb_num",
    "share_report_2_twitter_num",
    "share_report_2_whatsapp_num",
    "share_report_5_email_num",
    "share_report_5_fb_num",
    "share_report_5_twitter_num",
    "share_report_5_whatsapp_num"
  ) %>% # only keep variabel of interest
  gather(
    key = "variable", value = "value",
    share_report_1_email_num:share_report_5_whatsapp_num
  ) %>%
  group_by(variable) %>%
  summarize(
    N = n(),
    mean = mean(value, na.rm = TRUE),
    var = var(value, na.rm = TRUE),
    sd = sd(value, na.rm = TRUE),
    na_sum = sum(is.na(value))
  ) %>%
  mutate(n = N - na_sum) %>%
  mutate(
    se = sqrt(var / n),
    ci.error = qt(0.975, df = n - 1) * sd / sqrt(n),
    conf.low = mean - ci.error,
    conf.high = mean + ci.error
  )

data_plot$key <- str_to_title(gsub(" num", "", gsub("_", " ", data_plot$variable)))

data_plot$Type <- str_to_title(gsub(" Email| Fb| Twitter| Whatsapp", "", data_plot$key))

data_plot$key <- gsub("Share Report [0-9] ", "", data_plot$key)

data_plot <- data_plot %>%
  mutate(known = as.numeric(str_detect(variable, "know"))) %>%
  mutate(mean = round(mean, 2)) %>%
  mutate(mean = 100 * mean)

ggplot(data_plot, aes(x = Type, y = mean)) + # reorder(Type, -mean)
  geom_bar(
    stat = "identity",
    width = 0.7,
    position = position_dodge(width = 0.8),
    aes(
      fill = factor(Type),
      alpha = factor(key)
    )
  ) +
  geom_text(
    position = position_dodge(width = 0.8),
    aes(
      alpha = factor(key),
      label = paste(mean, "%", sep = "")
    ),
    vjust = 1.6,
    color = "black",
    size = 2
  ) +
  scale_fill_discrete(
    name = "Report",
    breaks = c("Share Report 1", "Share Report 2", "Share Report 5"),
    labels = c("Share Report 1", "Share Report 2", "Share Report 5")
  ) +
  scale_alpha_discrete(
    name = "Platform",
    range = c(1, .5),
    labels = c("Email", "Facebook", "Twitter", "Whatsapp")
  ) +
  xlab("Sharing across platforms and reports") +
  ylab("Percentage (%)") +
  theme_light() +
  theme(
    axis.text.x = element_text(
      angle = 20, hjust = 1, vjust = 1,
      margin = margin(0.2, 0, 0.3, 0, "cm")
    ),
    plot.title = element_text(hjust = 0.5),
    plot.margin = margin(0.5, 0.5, 0, 0.5, "cm"),
    panel.grid.major.x = element_blank(),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 9)
  )
```

Before the experimental stage, we measured people's general tendency to share. In the following, we repeat our analyses for the sharing outcomes on a subsample of frequent sharers. We define these subjects as those who indicated to share "a few times a year" or more. Figure \@ref(fig-A9) visualizes the reported sharing frequency across platforms. Since it differs across the four services four individual subjects, the following models deal with different subsamples each (e.g., regression on Facebook sharing includes only those subjects who frequently share via Facebook). Tables \@ref(tab:tab-A16) through \@ref(tab:tab-A17) do not reveal any patterns different from the main analyses. 


```{r fig-A9, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Sharing frequency across platforms\\label{fig-A9}", fig.height=3, out.extra = '', fig.pos= "ht"}

data_plot <- data %>%
  dplyr::select(
    "sharing_email_fac",
    "sharing_fb_fac",
    "sharing_twitter_fac",
    "sharing_whatsapp_fac"
  ) %>% # only keep variabel of interest
  gather(key = "variable", value = "value", sharing_email_fac:sharing_whatsapp_fac) %>%
  group_by(variable) %>%
  summarize(
    pct.Taeglich = mean(value == "Taeglich", na.rm = TRUE),
    pct.Seltener = mean(value == "Seltener", na.rm = TRUE),
    pct.EinmalproWoche = mean(value == "Einmal pro Woche", na.rm = TRUE),
    pct.EinpaarMalimMonat = mean(value == "Ein paar Mal im Monat", na.rm = TRUE),
    pct.EinpaarMalimJahr = mean(value == "Ein paar Mal im Jahr", na.rm = TRUE)
  ) %>% # only keep variabel of interest
  gather(key = "category", value = "value", pct.Taeglich:pct.EinpaarMalimJahr) %>%
  mutate(category = factor(category, levels = c(
    "pct.Seltener",
    "pct.EinpaarMalimJahr",
    "pct.EinpaarMalimMonat",
    "pct.EinmalproWoche",
    "pct.Taeglich"
  ), ordered = TRUE)) %>%
  mutate(value = round(value, 2)) %>%
  mutate(value = 100 * value)



library(ggplot2)
ggplot(data_plot, aes(x = variable, y = value)) +
  geom_bar(
    stat = "identity",
    width = 0.7,
    position = position_dodge(width = 0.8),
    aes(
      fill = factor(variable),
      alpha = factor(category)
    )
  ) +
  geom_text(
    position = position_dodge(width = 0.8),
    aes(
      alpha = factor(category),
      label = paste(value, "%", sep = "")
    ),
    vjust = 1.6,
    color = "black",
    size = 2
  ) +
  scale_fill_discrete(
    name = "Platform",
    breaks = c("sharing_email_fac", "sharing_fb_fac", "sharing_twitter_fac", "sharing_whatsapp_fac"),
    labels = c("Email", "Facebook", "Twitter", "Whatsapp")
  ) +
  scale_alpha_discrete(
    name = "Sharing frequency",
    range = c(1, 0.5),
    labels = c("Rarer", "A few times a year", "A few times a month", "Once a week", "Daily")
  ) +
  scale_x_discrete(labels = str_to_title(gsub("fb", "facebook", gsub("sharing_|_fac", "", unique(data_plot$variable))))) +
  xlab("Platforms") +
  ylab("Percentage (%)") +
  theme_light() +
  theme(
    axis.text.x = element_text(angle = 35, hjust = 1, vjust = 1, margin = margin(0.2, 0, 0.3, 0, "cm")),
    plot.title = element_text(hjust = 0.5),
    plot.margin = margin(0.5, 0.5, 0, 0.5, "cm"),
    panel.grid.major.x = element_blank(),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 9)
  )
```


```{r subsample-checksharers}

#### Subsets of data only including frequent sharers (several times a month, once a week, or daily)

data_email_sharers <- data %>% filter(sharing_email_num > 0)
data_fb_sharers <- data %>% filter(sharing_fb_num > 0)
data_whatsapp_sharers <- data %>% filter(sharing_whatsapp_num > 0)
data_twitter_sharers <- data %>% filter(sharing_twitter_num > 0)
```

```{r tab-A16, echo=FALSE, paged.print=FALSE, results="asis"}

h1b_email_lm_sharers <- lm(share_report_1_email_num ~
treatment_source, data = data_email_sharers)
h1b_fb_lm_sharers <- lm(share_report_1_fb_num ~
treatment_source, data = data_fb_sharers)
h1b_whatsapp_lm_sharers <- lm(share_report_1_twitter_num ~
treatment_source, data = data_whatsapp_sharers)
h1b_twitter_lm_sharers <- lm(share_report_1_whatsapp_num ~
treatment_source, data = data_twitter_sharers)

stargazer(h1b_email_lm_sharers, h1b_fb_lm_sharers,
  h1b_whatsapp_lm_sharers, h1b_twitter_lm_sharers,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A16",
  table.placement = "!ht",
  column.sep.width = "1pt",
  title = "Linear regression: Source",
  align = TRUE,
  dep.var.caption = "Dependent variables for Report 1",
  dep.var.labels = c("Share Email", "Share Facebook", "Share Twitter", "Share Whatsapp"),
  covariate.labels = c("Treat.: Source"),
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE,
  notes = c("*p<0.05; **p<0.01; ***p<0.001; Data: subsample of frequent sharers;", "Belief measured on 7-point scale (0-6) and sharing intention as a binary choice (0-1)."),
  header = FALSE,
  font.size = "scriptsize"
)
```

```{r tab-A17, paged.print=FALSE, results="asis"}

h2b_email_lm_sharers <- lm(share_report_5_email_num ~
treatment_congruence50, data = data_email_sharers)
h2b_fb_lm_sharers <- lm(share_report_5_fb_num ~
treatment_congruence50, data = data_fb_sharers)
h2b_twitter_lm_sharers <- lm(share_report_5_twitter_num ~
treatment_congruence50, data = data_whatsapp_sharers)
h2b_whatsapp_lm_sharers <- lm(share_report_5_whatsapp_num ~
treatment_congruence50, data = data_twitter_sharers)

stargazer(h2b_email_lm_sharers, h2b_fb_lm_sharers,
  h2b_twitter_lm_sharers, h2b_whatsapp_lm_sharers,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A17",
  table.placement = "!ht",
  column.sep.width = "1pt",
  title = "Linear regression: Congruence",
  align = TRUE,
  dep.var.caption = "Dependent variables for Report 5",
  dep.var.labels = c("Share Email", "Share Facebook", "Share Twitter", "Share Whatsapp"),
  covariate.labels = c("Treat.: Congruence"),
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE,
  notes = c("*p<0.05; **p<0.01; ***p<0.001; Data: subsample of frequent sharers;", "Belief measured on 7-point scale (0-6) and sharing intention as a binary choice (0-1)."),
  header = FALSE,
  font.size = "scriptsize"
)
```


\newpage


### Subsample: Manipulation check passers {#sec:robust-manipcheck}

We included tests to explore whether respondents received the manipulations as intended. Although some effects could materialize without the subjects being conscious of the manipulated variable, we would argue that a conscious perception provides the more plausible causal mechanism [see discussion in @MutzPemantle2015, pp. 193-197]. An effect of the source treatment is most plausible if respondents are aware of the source. To check respondents' awareness of the source, we asked them after the experimental stage which source the reports had come from. A large majority of respondents (`r round(prop.table(table(data$source_check_correct))[2], 3)*100`%) remembered the source correctly. The congruence treatment requires that respondents are aware of the implications of the report they read. Here, we checked whether they really read the reports by asking them about aspects of the stories after Report 2--4.

```{r subsample-checkmanipulation}
data$manipulation_check_sum <- data$report_2_check_correct +
  data$report_3_check_correct +
  data$report_4_check_correct +
  data$source_check_correct

data$manipulation_check_sum_fac <- factor(data$manipulation_check_sum)
levels(data$manipulation_check_sum_fac) <- c(
  "0 passed",
  "1 passed",
  "2 passed",
  "3 passed",
  "4 passed"
)

data_passed <- data %>% filter(manipulation_check_sum == 4)
```

Respondents mostly paid attention to the reports. Figure \@ref(fig-A10) visualizes the distribution of correct answers to our manipulation check questions. The manipulation check for Report 2 yielded `r round(prop.table(table(data$report_2_check_correct))[2], 3)*100`% correct answers; for Report 3 `r round(prop.table(table(data$report_3_check_correct))[2], 3)*100`% correct answers; for Report 4: `r round(prop.table(table(data$report_4_check_correct))[2], 3)*100`% correct answers. The manipulation check for the source yielded `r round(prop.table(table(data$source_check_correct))[2], 3)*100`% correct answers. Nonetheless, we decided to re-estimate all our models keeping only those `r table(data$manipulation_check_sum)[5]` respondents who passed all the manipulation checks. 

```{r fig-A10, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Distribution of manipulation check performance\\label{fig-A10}", fig.height=3.5, out.extra = '', fig.pos= "ht"}

ggplot(data, aes(data$manipulation_check_sum_fac)) +
  geom_bar(fill = "gray") +
  labs(x = "Number of passed manipulation check questions", y = "Frequency") +
  theme_light() +
  theme(
    axis.text.x = element_text(angle = 35, hjust = 1, vjust = 1, margin = margin(0.2, 0, 0.3, 0, "cm")),
    plot.title = element_text(hjust = 0.5),
    plot.margin = margin(0.5, 0.5, 0, 0.5, "cm")
  )
```

Tables \@ref(tab:tab-A18), \@ref(tab:tab-A19) and \@ref(tab:tab-A20) and Figure \@ref(fig-A11) show re-estimations of the main results after excluding individuals that did not pass all manipulation checks. Results remain substantially unchanged, and in some cases become even stronger, for example comparing Table \@ref(tab:tab-A19) with the original Table \@ref(tab:tab-A12).


\linespread{1}

```{r models-for-robustness-checkmanipulation, include=FALSE}

h1a_lm <- lm(belief_report_1_num ~ treatment_source, data = data_passed)
h1b_email_lm <- lm(share_report_1_email_num ~ treatment_source, data = data_passed)
h1b_fb_lm <- lm(share_report_1_fb_num ~ treatment_source, data = data_passed)
h1b_twitter_lm <- lm(share_report_1_twitter_num ~ treatment_source, data = data_passed)
h1b_whatsapp_lm <- lm(share_report_1_whatsapp_num ~ treatment_source, data = data_passed)

h2a_lm <- lm(belief_report_5_num ~ treatment_congruence50, data = data_passed)
h2b_email_lm <- lm(share_report_5_email_num ~ treatment_congruence50, data = data_passed)
h2b_fb_lm <- lm(share_report_5_fb_num ~ treatment_congruence50, data = data_passed)
h2b_twitter_lm <- lm(share_report_5_twitter_num ~ treatment_congruence50, data = data_passed)
h2b_whatsapp_lm <- lm(share_report_5_whatsapp_num ~ treatment_congruence50, data = data_passed)

h3a_lm <- lm(belief_report_5_num ~
treatment_source +
  treatment_congruence50 +
  treatment_source * treatment_congruence50, data = data_passed)
h3b_email_lm <- lm(share_report_5_email_num ~
treatment_source +
  treatment_congruence50 +
  treatment_source * treatment_congruence50, data = data_passed)
h3b_fb_lm <- lm(share_report_5_fb_num ~
treatment_source +
  treatment_congruence50 +
  treatment_source * treatment_congruence50, data = data_passed)
h3b_twitter_lm <- lm(share_report_5_twitter_num ~
treatment_source +
  treatment_congruence50 +
  treatment_source * treatment_congruence50, data = data_passed)
h3b_whatsapp_lm <- lm(share_report_5_whatsapp_num ~
treatment_source +
  treatment_congruence50 +
  treatment_source * treatment_congruence50, data = data_passed)
```


```{r tab-A18, echo=FALSE, paged.print=FALSE, results="asis"}

stargazer(h1a_lm, h1b_email_lm, h1b_fb_lm, h1b_twitter_lm, h1b_whatsapp_lm,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A18",
  table.placement = "!ht",
  column.sep.width = "1pt",
  title = "Linear regression: Source",
  align = TRUE,
  dep.var.caption = "Dependent variables for Report 1",
  dep.var.labels = c(
    "Belief", "Share Email",
    "Share Facebook", "Share Twitter", "Share Whatsapp"
  ),
  covariate.labels = c("Treat.: Source"),
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE,
  notes = c("*p<0.05; **p<0.01; ***p<0.001; Data: subsample of those passing all manipulation checks;", "Belief measured on 7-point scale (0-6) and sharing intention as a binary choice (0-1)."),
  header = FALSE,
  font.size = "scriptsize"
)
```


\newpage

```{r tab-A19, echo=FALSE, paged.print=FALSE, results="asis"}

stargazer(h2a_lm, h2b_email_lm, h2b_fb_lm, h2b_twitter_lm, h2b_whatsapp_lm,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A19",
  table.placement = "!ht",
  column.sep.width = "1pt",
  title = "Linear regression: Congruence",
  align = TRUE,
  dep.var.caption = "Dependent variables for Report 5",
  dep.var.labels = c(
    "Belief", "Share Email",
    "Share Facebook", "Share Twitter", "Share Whatsapp"
  ),
  covariate.labels = c("Treat.: Congruence"),
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE,
  notes = c("*p<0.05; **p<0.01; ***p<0.001; Data: subsample of those passing all manipulation checks;", "Belief measured on 7-point scale (0-6) and sharing intention as a binary choice (0-1)."),
  header = FALSE,
  font.size = "scriptsize"
)
```



```{r tab-A20, echo=FALSE, paged.print=FALSE, results="asis"}

stargazer(h3a_lm, h3b_email_lm, h3b_fb_lm, h3b_twitter_lm, h3b_whatsapp_lm,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A20",
  table.placement = "H",
  column.sep.width = "1pt",
  title = "Linear regression: Source, congruence and interaction",
  align = TRUE,
  dep.var.caption = "Dependent variables for Report 5",
  dep.var.labels = c(
    "Belief", "Share Email",
    "Share Facebook", "Share Twitter", "Share Whatsapp"
  ),
  covariate.labels = c(
    "Treat.: Source",
    "Treat.: Congruence",
    "Treat.: Source*Congruence"
  ),
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE,
  notes = c("*p<0.05; **p<0.01; ***p<0.001; Data: subsample of those passing all manipulation checks;", "Belief measured on 7-point scale (0-6) and sharing intention as a binary choice (0-1)."),
  header = FALSE,
  font.size = "scriptsize"
)
```


\linespread{1.6}

```{r fig-A11, fig.cap="Source credibility development\\label{fig-A11}", message=FALSE, warning=FALSE, paged.print=FALSE, fig.height=3.5, out.extra = '', fig.pos= "ht", results = 'hide'}

credibility_development <- data_passed %>%
  group_by(treatment_congruence50, treatment_source) %>%
  select(
    belief_report_1_num,
    belief_report_5_num
  ) %>%
  group_by(N = n(), add = TRUE) %>%
  gather(
    key = "variable", value = "value",
    c("belief_report_1_num", "belief_report_5_num")
  ) %>%
  group_by(treatment_congruence50, treatment_source, variable) %>%
  summarize(
    N = mean(N),
    mean = mean(value, na.rm = TRUE),
    var = var(value, na.rm = TRUE),
    sd = sd(value, na.rm = TRUE),
    na_sum = sum(is.na(value))
  ) %>%
  mutate(n = N - na_sum) %>%
  mutate(
    se = sqrt(var / n),
    ci_error = qt(0.975, df = n - 1) * sd / sqrt(n),
    conf_low = mean - ci_error,
    conf_high = mean + ci_error
  ) %>%
  ungroup() %>%
  mutate(
    treatment_source =
      case_when(
        treatment_source == 0 ~ "Nachrichten 360",
        treatment_source == 1 ~ "Tagesschau"
      )
  ) %>%
  mutate(treatment_source = factor(treatment_source,
    ordered = TRUE,
    levels = c(
      "Nachrichten 360",
      "Tagesschau"
    )
  )) %>%
  mutate(
    treatment_congruence50 =
      case_when(
        treatment_congruence50 == 0 ~ "Incongruent",
        treatment_congruence50 == 1 ~ "Congruent"
      )
  ) %>%
  mutate(treatment_congruence50 = factor(treatment_congruence50,
    ordered = TRUE,
    levels = c(
      "Incongruent",
      "Congruent"
    )
  )) %>%
  mutate(variable = case_when(
    variable == "belief_report_1_num" ~ 1,
    variable == "belief_report_5_num" ~ 5
  )) %>%
  rename(
    "Groups_Congruence_treatment" = treatment_congruence50,
    "Groups_Source_treatment" = treatment_source
  )

ggplot(
  credibility_development,
  aes(
    x = variable,
    y = mean,
    group = interaction(Groups_Source_treatment, Groups_Congruence_treatment),
    color = Groups_Source_treatment,
    linetype = Groups_Congruence_treatment
  )
) +
  geom_point(size = 1) +
  geom_path() +
  geom_errorbar(aes(ymin = conf_low, ymax = conf_high),
    width = .1,
  ) +
  ylim(0, 6) +
  xlim(0.75, 5.25) +
  labs(y = "Belief (0-6)", x = "") +
  theme_classic() +
  scale_color_manual(
    name = "Treatment: Source",
    values = c("red", "blue")
  ) +
  scale_linetype_manual(
    name = "Treatment: Congruence",
    values = c("dashed", "solid")
  ) +
  scale_x_continuous(
    breaks = c(1, 5),
    minor_breaks = c(2, 3, 4),
    labels = c("Report 1", "Report 5")
  ) +
  theme_light() +
  theme(
    axis.title.y = element_text(size = 12),
    axis.text.x = element_text(size = 12),
    legend.title = element_text(size = 12),
    legend.text = element_text(size = 11),
    plot.title = element_text(size = 13, hjust = 0.5),
    plot.margin = margin(2, 0, 0, 0.5, "cm")
  ) +
  ggplot2::annotate(
    geom = "text", x = 3, y = 0.5,
    label = "Reports 2-4: Content manipulated, \nbelief not measured.",
    color = "gray",
    size = 4
  )
```


\newpage

### Subsample: Always focused {#sec:robust-focus}



```{r focus-descriptives}

#### Load and run PageFocus functions
#### Source: Diedenhofen, Birk, and Jochen Musch. 2017. "Pagefocus: Using Paradata to Detect and Prevent Cheating on Online Achievement Tests." Behavior Research Methods 49 (4): 1444– 59. Code avaible on https://github.com/deboerk/PageFocus.

source("input/pagefocus.r")

data <- cbind(data, pagefocus.analysis(data,
  pages_focus,
  statistics = c("loss", "count", "error", "duration")
))

### Defocus count across questionnaire

data <- data %>% mutate(defocus_count = rowSums(select(., ends_with("_count")), na.rm = T))

### Defocussed pages across questionnaire

data <- data %>% mutate(defocussed_page_count = rowSums(select(., ends_with("_loss")), na.rm = T))

### Total defocussing duration across questionnaire

data <- data %>% mutate(defocus_duration = rowSums(select(., ends_with("_duration")), na.rm = T))
```

Being asked about the truth of factual information, respondents might be tempted to take advantage of the unsupervised situation to verify reports [cf. @JensenThomsen2014]. We therefore adapted a Java Script developed by @DiedenhofenMusch2017 to track whether and at what points people left the questionnaire. For each questionnaire page, our script stored a time stamp for each "defocus" and "refocus" event. In addition, when respondents left and returned to the questionnaire, a popup appeared asking them to not leave the questionnaire again. Over the whole questionnaire, `r round(prop.table(table(data$defocus_count == 0))[2]*100, 2)`% of respondents never left the questionnaire, `r round(prop.table(table(data$defocus_count > 3))[2]*100, 2)`% left it more than three times. Of those who left the questionnaire at least once, the median time spent away was `r round(median(data$defocus_duration[data$defocus_count > 0]), 2)` seconds. In Tables \@ref(tab:tab-A21), \@ref(tab:tab-A22) and \@ref(tab:tab-A23) and Figure \@ref(fig-A12), we present robustness tests with only those subjects who did not leave the questionnaire. It shows that the source-congruence interaction becomes insignificant, which is potentially due to the reduced sample size.

\linespread{1}

```{r models-for-robustness-checkfocus, include=FALSE}

data_focussed <- data %>% filter(defocus_count == 0)

h1a_lm <- lm(belief_report_1_num ~ treatment_source, data = data_focussed)
h1b_email_lm <- lm(share_report_1_email_num ~ treatment_source, data = data_focussed)
h1b_fb_lm <- lm(share_report_1_fb_num ~ treatment_source, data = data_focussed)
h1b_twitter_lm <- lm(share_report_1_twitter_num ~ treatment_source, data = data_focussed)
h1b_whatsapp_lm <- lm(share_report_1_whatsapp_num ~ treatment_source, data = data_focussed)

h2a_lm <- lm(belief_report_5_num ~ treatment_congruence50, data = data_focussed)
h2b_email_lm <- lm(share_report_5_email_num ~ treatment_congruence50, data = data_focussed)
h2b_fb_lm <- lm(share_report_5_fb_num ~ treatment_congruence50, data = data_focussed)
h2b_twitter_lm <- lm(share_report_5_twitter_num ~ treatment_congruence50, data = data_focussed)
h2b_whatsapp_lm <- lm(share_report_5_whatsapp_num ~ treatment_congruence50, data = data_focussed)

h3a_lm <- lm(belief_report_5_num ~
treatment_source +
  treatment_congruence50 +
  treatment_source * treatment_congruence50, data = data_focussed)
h3b_email_lm <- lm(share_report_5_email_num ~
treatment_source +
  treatment_congruence50 +
  treatment_source * treatment_congruence50, data = data_focussed)
h3b_fb_lm <- lm(share_report_5_fb_num ~
treatment_source +
  treatment_congruence50 +
  treatment_source * treatment_congruence50, data = data_focussed)
h3b_twitter_lm <- lm(share_report_5_twitter_num ~
treatment_source +
  treatment_congruence50 +
  treatment_source * treatment_congruence50, data = data_focussed)
h3b_whatsapp_lm <- lm(share_report_5_whatsapp_num ~
treatment_source +
  treatment_congruence50 +
  treatment_source * treatment_congruence50, data = data_focussed)
```


```{r tab-A21, echo=FALSE, paged.print=FALSE, results="asis"}

stargazer(h1a_lm, h1b_email_lm, h1b_fb_lm, h1b_twitter_lm, h1b_whatsapp_lm,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A21",
  table.placement = "!ht",
  column.sep.width = "1pt",
  title = "Linear regression: Source",
  align = TRUE,
  dep.var.caption = "Dependent variables for Report 1",
  dep.var.labels = c(
    "Belief", "Share Email",
    "Share Facebook", "Share Twitter", "Share Whatsapp"
  ),
  covariate.labels = c("Treat.: Source"),
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE,
  notes = c("*p<0.05; **p<0.01; ***p<0.001; Data: Subsample of respondents who always stayed focused;", "Belief measured on 7-point scale (0-6) and sharing intention as a binary choice (0-1)."),
  header = FALSE,
  font.size = "scriptsize"
)
```



```{r tab-A22, echo=FALSE, paged.print=FALSE, results="asis"}

stargazer(h2a_lm, h2b_email_lm, h2b_fb_lm, h2b_twitter_lm, h2b_whatsapp_lm,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A22",
  table.placement = "!ht",
  column.sep.width = "1pt",
  title = "Linear regression: Congruence",
  align = TRUE,
  dep.var.caption = "Dependent variables for Report 5",
  dep.var.labels = c(
    "Belief", "Share Email",
    "Share Facebook", "Share Twitter", "Share Whatsapp"
  ),
  covariate.labels = c("Treat.: Congruence"),
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE,
  notes = c("*p<0.05; **p<0.01; ***p<0.001; Data: Subsample of respondents who always stayed focused;", "Belief measured on 7-point scale (0-6) and sharing intention as a binary choice (0-1)."),
  header = FALSE,
  font.size = "scriptsize"
)
```



```{r tab-A23, echo=FALSE, paged.print=FALSE, results="asis"}

stargazer(h3a_lm, h3b_email_lm, h3b_fb_lm, h3b_twitter_lm, h3b_whatsapp_lm,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A23",
  table.placement = "H",
  column.sep.width = "1pt",
  title = "Linear regression: Source, congruence and interaction",
  align = TRUE,
  dep.var.caption = "Dependent variables for Report 5",
  dep.var.labels = c(
    "Belief", "Share Email",
    "Share Facebook", "Share Twitter", "Share Whatsapp"
  ),
  covariate.labels = c(
    "Treat.: Source",
    "Treat.: Congruence",
    "Treat.: Source*Congruence"
  ),
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE,
  notes = c("*p<0.05; **p<0.01; ***p<0.001; Data: Subsample of respondents who always stayed focused;", "Belief measured on 7-point scale (0-6) and sharing intention as a binary choice (0-1)."),
  header = FALSE,
  font.size = "scriptsize"
)
```

\linespread{1.6}

```{r fig-A12, fig.cap="Source credibility development\\label{fig-A12}", message=FALSE, warning=FALSE, paged.print=FALSE, fig.height=3.5, out.extra = '', fig.pos= "ht", results = 'hide'}

credibility_development <- data_focussed %>%
  group_by(treatment_congruence50, treatment_source) %>%
  select(
    belief_report_1_num,
    belief_report_5_num
  ) %>%
  group_by(N = n(), add = TRUE) %>%
  gather(
    key = "variable", value = "value",
    c("belief_report_1_num", "belief_report_5_num")
  ) %>%
  group_by(treatment_congruence50, treatment_source, variable) %>%
  summarize(
    N = mean(N),
    mean = mean(value, na.rm = TRUE),
    var = var(value, na.rm = TRUE),
    sd = sd(value, na.rm = TRUE),
    na_sum = sum(is.na(value))
  ) %>%
  mutate(n = N - na_sum) %>%
  mutate(
    se = sqrt(var / n),
    ci_error = qt(0.975, df = n - 1) * sd / sqrt(n),
    conf_low = mean - ci_error,
    conf_high = mean + ci_error
  ) %>%
  ungroup() %>%
  mutate(
    treatment_source =
      case_when(
        treatment_source == 0 ~ "Nachrichten 360",
        treatment_source == 1 ~ "Tagesschau"
      )
  ) %>%
  mutate(treatment_source = factor(treatment_source,
    ordered = TRUE,
    levels = c(
      "Nachrichten 360",
      "Tagesschau"
    )
  )) %>%
  mutate(
    treatment_congruence50 =
      case_when(
        treatment_congruence50 == 0 ~ "Incongruent",
        treatment_congruence50 == 1 ~ "Congruent"
      )
  ) %>%
  mutate(treatment_congruence50 = factor(treatment_congruence50,
    ordered = TRUE,
    levels = c(
      "Incongruent",
      "Congruent"
    )
  )) %>%
  mutate(variable = case_when(
    variable == "belief_report_1_num" ~ 1,
    variable == "belief_report_5_num" ~ 5
  )) %>%
  rename(
    "Groups_Congruence_treatment" = treatment_congruence50,
    "Groups_Source_treatment" = treatment_source
  )

ggplot(
  credibility_development,
  aes(
    x = variable,
    y = mean,
    group = interaction(Groups_Source_treatment, Groups_Congruence_treatment),
    color = Groups_Source_treatment,
    linetype = Groups_Congruence_treatment
  )
) +
  geom_point(size = 1) +
  geom_path() +
  geom_errorbar(aes(ymin = conf_low, ymax = conf_high),
    width = .1,
  ) +
  ylim(0, 6) +
  xlim(0.75, 5.25) +
  labs(y = "Belief (0-6)", x = "") +
  theme_classic() +
  scale_color_manual(
    name = "Treatment: Source",
    values = c("red", "blue")
  ) +
  scale_linetype_manual(
    name = "Treatment: Congruence",
    values = c("dashed", "solid")
  ) +
  scale_x_continuous(
    breaks = c(1, 5),
    minor_breaks = c(2, 3, 4),
    labels = c("Report 1", "Report 5")
  ) +
  theme_light() +
  theme(
    axis.title.y = element_text(size = 12),
    axis.text.x = element_text(size = 12),
    legend.title = element_text(size = 12),
    legend.text = element_text(size = 11),
    plot.title = element_text(size = 13, hjust = 0.5),
    plot.margin = margin(2, 0, 0, 0.5, "cm")
  ) +
  ggplot2::annotate(
    geom = "text", x = 3, y = 0.5,
    label = "Reports 2-4: Content manipulated, \nbelief not measured.",
    color = "gray",
    size = 4
  )
```


\newpage

### Channel: Facebook vs. website treatment design {#sec:robustwebsitetreatment}

The low willingness to share, especially via email, Whatsapp and Twitter, might be due to the fact that we partly presented the stimuli as Facebook posts. We can address this concern by examining the third treatment dimension of the experiment (results for which we report elsewhere). We randomly varied whether respondents were exposed to the stimuli as screenshots of a Facebook post or or of a news site (cf. Figures \@ref(fig-A16) and \@ref(fig-A17) for the respective designs). As shown in Table \@ref(tab:tab-A24), we do not find that this treatment affects the sharing intentions across non-Facebook channels. This leads us to believe that the low willingness to share is not an artefact of our design.




```{r models-for-tab-A24, include=FALSE}
h1a_lm <- lm(belief_report_1_num ~ treatment_channel, data = data)
h1b_email_lm <- lm(share_report_1_email_num ~ treatment_channel, data = data)
h1b_fb_lm <- lm(share_report_1_fb_num ~ treatment_channel, data = data)
h1b_twitter_lm <- lm(share_report_1_twitter_num ~ treatment_channel, data = data)
h1b_whatsapp_lm <- lm(share_report_1_whatsapp_num ~ treatment_channel, data = data)
```


```{r tab-A24, echo=FALSE, paged.print=FALSE, results="asis"}

stargazer(h1a_lm, h1b_email_lm, h1b_fb_lm, h1b_twitter_lm, h1b_whatsapp_lm,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A24",
  table.placement = "!ht",
  column.sep.width = "1pt",
  title = "Linear regression: Channel treatment",
  align = TRUE,
  dep.var.caption = "Dependent variables for Report 1",
  dep.var.labels = c(
    "Belief", "Share Email",
    "Share Facebook", "Share Twitter", "Share Whatsapp"
  ),
  covariate.labels = c("Treat.: Channel"),
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE,
  notes = c("*p<0.05; **p<0.01; ***p<0.001; Data: Subsample of respondents who were in the website design treatment;", "Belief measured on 7-point scale (0-6) and sharing intention as a binary choice (0-1)."),
  header = FALSE,
  font.size = "scriptsize"
)
```



\newpage

## Ethical considerations and open feedback {#sec:openfeedback}

We were seeking deception-free ways to study our hypotheses, but minor deception was necessary to test our hypotheses in the cleanest way. In the final design, we presented participants with constructed news sources as well as constructed news reports [for a similar approach, see @BaumGroeling2009; @Kuruetal2017; @Junetal2017]. To mitigate potential negative effects of deception, we resorted to several strategies. First, we debriefed subjects in detail at the end of the survey: We informed participants about the objective of our experiment and clarified that *Nachrichten 360* was a made-up source. We also clarified that the factual claims and conclusions of three reports were manipulated, and briefed participants with true facts about the respective issues. We also emphasized that "tagesschau.de referred only to [these] facts" in order not to taint the reputation of the Tagesschau. Second, we sent consenting respondents an email with more substantive information related to the news reports a few weeks after the study was finished. 

Finally, we provided participants with an open-ended feedback box after the debriefing. A qualitative review of the feedback revealed that the survey experience was mainly positive, and there were less than five participants with complaints. None of the participants suggested that the study (because of the contained deception) should not have been run. In the following, we also present some quantitative analysis of the open feedback. `r round(sum(!is.na(data$debrief_feedback))/sum(is.na(data$debrief_feedback))*100, 2)` percent of respondents provided written feedback. Figure \@ref(fig-A13) provides a simple, translated wordcloud (stopwords were omitted) that indicates an overall positive sentiment.   

```{r data-feedback}


library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")
library("tidytext")
library("kableExtra")
library(ggwordcloud)

data_feedback <- data_frame(
  line = 1:length(data$debrief_feedback),
  text = data$debrief_feedback
)
data_feedback <- data_feedback %>% mutate(text = tolower(text))

# Tokenization and tranform into tidy structure
text_df <- data_feedback %>%
  unnest_tokens(word, text)
```

```{r fig-A13, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Wordcloud of open-ended feedback\\label{fig-A13}", fig.height=3, out.extra = '', fig.pos= "ht"}

# Wordcloud data
text_df_agg <- text_df %>% #   # Count words (frequency)
  anti_join(get_stopwords(language = "de")) %>%
  dplyr::count(word, sort = TRUE) %>%
  top_n(20) %>%
  na.omit()

text_df_agg$word_english <- c("interesting", "survey", "for", "that", "interesting", "study", "more", "news", "thank you", "always", "media", "good", "survey", "opinion", "find", "total", "with pleasure", "fake", "done", "give", "news", "schon", "over")

# Wordcloud
ggplot(text_df_agg, aes(label = word_english, size = n)) +
  geom_text_wordcloud_area(area_corr_power = 1, rm_outside = TRUE) +
  scale_size_area(max_size = 10) +
  theme_minimal() +
  theme(plot.caption = element_text(hjust = 0))
```

```{r data-dictionary}

# Sentiment analysis (this need to be german!)

## Take the full word lists not the stems

dictionary <- c(
  # positive Wörter
  readLines(paste0("input/data_SentiWS_v2.0_Positive.txt"), encoding = "UTF-8"),
  # negative Wörter
  readLines(paste0("input/data_SentiWS_v2.0_Negative.txt"),
    encoding = "UTF-8"
  )
) %>%
  lapply(function(x) {
    # Extrahieren der einzelnen Spalten
    res <- strsplit(x, "\t", fixed = TRUE)[[1]]
    return(data.frame(
      stem = res[1], value = res[2], words = res[3],
      stringsAsFactors = FALSE
    ))
  }) %>%
  bind_rows()

## Clean stem and merge with words, select subset
dictionary <- dictionary %>%
  mutate(stem = gsub("\\|NN", "", stem)) %>%
  mutate(word = paste(stem, words, sep = ",")) %>%
  select(value, word)

## Write the words into rows with each having sentiment value
dictionary <- separate_rows(dictionary, word, sep = ",") %>%
  mutate(word = tolower(word)) %>%
  arrange(word) %>%
  filter(!is.na(word)) %>%
  filter(!word == "")

## Further cleaning
x <- dictionary %>% filter(str_detect(word, "\\|"))
dictionary$word <- gsub("\\|adjx", "", dictionary$word)
dictionary$word <- gsub("\\|vvinf", "", dictionary$word)
dictionary$word <- gsub("\\|adv", "", dictionary$word)
```

Subsequently, we used a German sentiment dictionary provided by @Remus2010-iv to estimate sentiment scores for the responses. The corresponding dictionary contains around  `r format(as.numeric(nrow(dictionary)), nsmall=0, big.mark=",")` positive and negative basic forms, corresponding to around 16.000 positive and 18.000 negative word forms. The dictionary comprises adjectives and adverbs, but also nouns and verbs that carry sentiment. Figure \@ref(fig-A14) displays the distribution of sentiment scores for the feedback responses and very few responses get negative scores.  

```{r data-sentiment-generate}

## Get sentiment for each word in a feedback response
sentiments_word_level <- left_join(text_df, dictionary, by = "word") %>%
  mutate(value = as.numeric(value))
# Wordstems are not matched: Quite a few responses don't get a score

## Aggregate sentiment by line/response
sentiments_response_level <- sentiments_word_level %>%
  group_by(line) %>%
  summarize(mean_value = mean(value, na.rm = TRUE))


## merge with real data
data_feedback <- left_join(data_feedback, sentiments_response_level, by = "line")
# problem responses with no sentimat

## Only keep rows with responses
data_feedback <- data_feedback[!is.na(data_feedback$text), ]

## Sort according to sentiment score
data_feedback <- data_feedback %>% arrange(mean_value)
```

```{r fig-A14, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Distribution of sentiment scores\\label{fig-A14}", fig.height=3, out.extra = '', fig.pos= "ht"}
# Wordcloud
ggplot(data_feedback, aes(x = mean_value)) +
  geom_histogram(fill = "gray") +
  theme_classic() +
  labs(x = "Distribution of sentiment scores\nof open feedback responses", y = "N") +
  geom_vline(aes(xintercept = 0), linetype = "dashed") +
  xlim(-1, 1)
```


\newpage


## Some screenshots from the survey {#sec:screenshots}

Figure \@ref(fig-A15) provides an example of a news report subjects were asked to read: The screenshot presented as taken from the original website or Facebook post on top, the text of the report below. Figures \@ref(fig-A16) and \@ref(fig-A17) display (2 $\times$ 2 $\times$ 2) screenshots for one example (Report 2) across the three treatment dimensions source, channel and content, only two of which we discuss in the present paper. Note that screenshots were further adapted for use on mobile devices (not shown here). 


\begin{figure}[H]
\centering
\caption{Example of report presentation}\label{fig-A15}
		\includegraphics[width=.4\linewidth]{input/fig-A15}
\vspace{-0.5cm}

\begin{flushleft}
{ 
\scriptsize



\normalsize
}
\end{flushleft}
\end{figure}

\begin{figure}[H]
\centering
\caption{Nachrichten 360: Example of screenshots used in the treatments}\label{fig-A16}
		\includegraphics[width=.4\linewidth]{input/fig-A16-1}
		\includegraphics[width=.4\linewidth]{input/fig-A16-2}
		\includegraphics[width=.45\linewidth]{input/fig-A16-3}
		\includegraphics[width=.45\linewidth]{input/fig-A16-4}
\vspace{-0.5cm}

\begin{flushleft}
{ 
\scriptsize



\normalsize
}
\end{flushleft}
\end{figure}


\begin{figure}[H]
\centering
\caption{Tagesschau: Example of screenshots used in the treatments}\label{fig-A17}
		\includegraphics[width=.4\linewidth]{input/fig-A17-1}
		\includegraphics[width=.4\linewidth]{input/fig-A17-2}
		\includegraphics[width=.45\linewidth]{input/fig-A17-3}
		\includegraphics[width=.45\linewidth]{input/fig-A17-4}
\begin{flushleft}
\end{flushleft}
\end{figure}


\newpage

## Treatment heterogeneity {#sec:treatmentheterogeneity}
### A quick overview of causal forests {#sec:thoverview}

To estimate treatment heterogeneity we rely on the "causal forest" method developed by @Athey2016-ow [also @Wager2018-tk; @Athey2019-fy]. The method goes back to classification and regression forest algorithms [@Breiman2001-ou]. For an extensive overview of the underlying procedures of causal forests we refer the reader to the website corresponding to the grf R package [@Tibshirani2020-uw]. 

A causal forest is composed of a group of *trees*. For each tree, a random subset of the data is sampled. In a next step, this subset is divided into two disjoint sets, the first for building the tree, and the second for repopulating the tree. This step is required by the condition of "honesty". To build a tree, the algorithm seeks to recursively split the data. This is done by first selecting a random subset of variables as candidates for splitting. Then, for each of these variables, all of its possible values and the resulting two *child nodes* are considered. The goodness of a split depends on how much it increases treatment heterogeneity between the two child nodes. Certain splits are not considered, because the resulting child nodes would be too different in size or too small to compare treatment groups within them. All observations with values for the split variable that are less than or equal to the split value are placed in a new left child node, and all examples with greater values are placed in a right child node. If a node cannot be split further, it forms a *leaf* of the final tree. 

Now the "repopulation" data from the honesty split is used to populate the tree's leaf nodes: each new observation is "pushed down" the tree, and added to the leaf in which it falls. The honesty procedure ensures avoiding the problem of adaptive estimation, which describes a situation in which spurious extreme values of the outcome will determine splits and bias estimation [cf. @Athey2016-ow, p. 7355]. Figure \@ref(fig-A18) visualizes a tree from the forest on which Figure \@ref(fig-7) is based (heterogeneity of the source effect on belief). The diagram shows both the variables on which the data was split, as well as the values at which splits occured. Purple boxes represent the leaves, after which no further splits occur.


```{r fig-A18, eval=FALSE, fig.cap="Part of exemplary tree estimated on the basis of the present data\\label{fig-A18}", fig.width=10, message=FALSE, warning=FALSE, include=FALSE}
#plot(get_tree(tau.forest.example.tree, 1))
# Manuel export from Rstudio
```


\begin{figure}[H]
\centering
\caption{Part of exemplary tree estimated on the basis of the present data}\label{fig-A18}
		\includegraphics[width=1\linewidth]{input/fig-A18.png}
\end{figure}


To grow a forest, this procedure is repeated many times, each time randomly re-sampling from the data. Following @AtheyWager2019, one first grows a pilot tree using all covariates in the data and then check each variable's "importance": The importance of a variable is calculated as a simple weighted sum of how many times that variable was split on at each depth in the forest. In a second step, a forest is regrown using only those covariates that had above average importance. Based on this forest, the ultimate objective is to estimate a treatment effect for each individual observation. This is done the following way: According to the "out-of-bag" procedure, for each observation, all trees that did not use this observation (due to random sampling of observations) are identified. In these trees, the observation is "pushed down" into the appropriate leaf. Next, a list of neighbouring observations is created, weighted by how many times they fell into the same leaf as the observation of interest. The predicted treatment effect for the observation is calculated using the outcomes and treatment status of neighboring observations. 

Apart from assessing the relevance of covariates for these predictions, several approaches allow to test whether the heterogeneity found is real [cf. @AtheyWager2019]. As a first omnibus test of treatment heterogeneity, one can group observations according to whether their predicted treatment effects are below or above the median prediction and estimate the average for each group, and then calculate the standard error of this difference from the standard errors of the two group estimates. As @AtheyWager2019 [p. 7] point out, this procedure is somewhat heuristic but gives qualitative insights about the strength of heterogeneity. A second omnibus test of heterogeneity is motivated by the "best linear predictor" method that tries to fit the individual treatment effects as a linear function of the causal forest estimates. The coefficients of this model provide evidence whether or not the causal forest succeeded in finding heterogeneity. 

These omnibus tests inform about the presence of heterogeneity in general but do not say anything about significance of individual variables. Apart from visual inspection of individual treatment effect predictions plotted against covariates, a method to test whether covariates used to grow the final tree significantly predict heterogeneity, is the "best linear projection" method of the `grf` package. 

### Treatment effects on belief {#sec:resultsheterogeneitybelief}

Table \@ref(tab:tab-A25) contains variable importance for the covariates used in the two final causal trees for source and congruence treatment effects on news belief. Table \@ref(tab:tab-A26) shows results for the second omnibus test described in the paper. Table \@ref(tab:tab-A27) depicts the results of the best linear projections of the two treatment effects.

\linespread{1}

```{r tab-A25, echo=FALSE, message=FALSE, warning=FALSE, results="asis"}

var_imp_belief <- bind_rows(var_imp_source_belief, var_imp_congruence_belief) %>%
  mutate(Importance = round(Importance, 3))

knitr::kable(var_imp_belief,
  caption = "Covariate importance (source and congruence effects on belief)",
  format = "latex",
  booktabs = T,
  col.names = Hmisc::capitalize(gsub("_", ": ", names(var_imp_belief))),
  linesep = ""
) %>%
  kable_styling(
    full_width = T,
    latex_options = c("striped", "scale_down", "HOLD_position"),
    font_size = 9
  )
```

```{r tab-A26, echo=FALSE, results="asis"}

stargazer(calibr_source_belief, calibr_congruence_belief,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A26",
  table.placement = "!ht",
  column.sep.width = "1pt",
  title = "Best linear predictor test (source and congruence effects on belief)",
  align = TRUE,
  covariate.labels = c("Mean forest prediction", "Differential forest prediction"),
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE, notes = c("*p<0.05; **p<0.01; ***p<0.001."),
  header = FALSE,
  font.size = "scriptsize"
)
```

```{r tab-A27, echo=FALSE, paged.print=FALSE, results="asis"}

dimnames(blpht_source_belief)[[1]] <- recode(dimnames(blpht_source_belief)[[1]],
       "Constant" = "Intercept",
       "trust_source_mainstream" = "Mainstream source trust",
       "age" = "Age",
       "vote_choice_afd_num" = "Vote choice AfD",
       "income_num" = "Income",
       "know_politicians_total" = "Political knowledge",
       "education_num" = "Education",
       "sharing_frequency" = "Sharing Frequency",
       "overconfidence" = "Overconfidence",
       "know_source_mainstream" = "Media knowledge")
dimnames(blpht_congruence_belief)[[1]] <- recode(dimnames(blpht_congruence_belief)[[1]],
       "Constant" = "Intercept",
       "trust_source_mainstream" = "Mainstream source trust",
       "age" = "Age",
       "vote_choice_afd_num" = "Vote choice AfD",
       "income_num" = "Income",
       "know_politicians_total" = "Political knowledge",
       "education_num" = "Education",
       "sharing_frequency" = "Sharing Frequency",
       "overconfidence" = "Overconfidence",
       "know_source_mainstream" = "Media knowledge")

stargazer(blpht_source_belief,
  blpht_congruence_belief,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A27",
  table.placement = "!ht",
  column.sep.width = "1pt",
  title = "Best linear projection of predictions (source and congruence effects on belief)",
  align = TRUE,
  dep.var.labels = c("Belief", "Belief"),
  column.labels = c("Source effect predictions", "Congruence effect predictions"),
  covariate.labels = NULL,
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.05, 0.01, 0.001), star.char = c("*", "**", "***"),
  notes.append = FALSE, notes = c("*p<0.05; **p<0.01; ***p<0.001."),
  header = FALSE,
  font.size = "scriptsize"
)
```

### Treatment effects on sharing {#sec:resultsheterogeneitysharing}

```{r causal-forest-estimations-sharing, message=FALSE, warning=FALSE, include=FALSE}

# Setup results storage ####
vars_outcome <- c("share_report_1_email_num", "share_report_1_fb_num", "share_report_1_twitter_num", "share_report_1_whatsapp_num")
vars_treatment <- c("treatment_source", "treatment_congruence50")
list_data_heterogeneity <- list()
list_causalforests <- list()
list_variableimportance <- list()
list_forest_estimate_ate <- list()
list_ate_difference_source_sharing_heterogeneity <- list()
list_test_calibration_source_sharing_results <- list()
list_variable_importance_table <- list()
list_blpht_results <- list()
list_blpht_results_vars <- list()
list_blpht_results_text <- list()
list_variable_importance <- list()
list_test_calibration_source_sharing <- list()

# Loop 1 ####
for (y in vars_treatment) {
  for (z in vars_outcome) {
    if (y == "treatment_congruence50") {
      z <- gsub("report_1", "report_5", z)
    }
    yz <- paste(y, z, sep = " - ")
    list_data_heterogeneity[[yz]] <- data %>%
      select(
        z,
        y,
        sex_num,
        age,
        federal_state_west,
        education_num,
        income_num,
        turnout_num,
        vote_choice_leftparty,
        vote_choice_greens,
        vote_choice_spd,
        vote_choice_cdu_csu,
        vote_choice_fdp,
        vote_choice_afd_num,
        vote_choice_dont_know,
        know_html_correct,
        know_source_mainstream,
        trust_source_mainstream,
        use_email_num,
        use_fb_num,
        use_twitter_num,
        use_whatsapp_num,
        know_politicians_total,
        overconfidence
      ) %>%
      na.omit() %>%
      data.frame()



    library(grf)
    # Define data for causal forest function ####
    X <- list_data_heterogeneity[[yz]] %>% select(-z, -y)
    W <- list_data_heterogeneity[[yz]] %>%
      select(y) %>%
      pull()
    Y <- list_data_heterogeneity[[yz]] %>%
      select(z) %>%
      pull()

    # Estimate forest 1 ####
    tau.forest <- causal_forest(
      X = X,
      Y = Y,
      W = W,
      num.trees = number_of_trees
    )


    # Variable Importance Step 1 ####
    variable_importance <- tau.forest %>%
      variable_importance(max.depth = tree_depth) %>%
      as.data.frame() %>%
      mutate(variable = colnames(tau.forest$X.orig)) %>%
      arrange(desc(V1))

    selected.idx <- variable_importance %>%
      filter(V1 > mean(V1)) %>%
      select(variable) %>%
      pull() #

    # Estimate forest 2 ####
    tau.forest <- causal_forest(X[, selected.idx],
      Y,
      W,
      num.trees = number_of_trees
    )

    # Estimate forest ate ####
    list_forest_estimate_ate[[yz]] <- average_treatment_effect(tau.forest,
      target.sample = c("all"),
      method = c("AIPW", "TMLE"), subset = NULL
    )[1]


    # Predict CATEs/tau hat ####
    tau.hat <- predict(tau.forest, estimate.variance = TRUE)
    sigma.hat <- sqrt(tau.hat$variance.estimates)






    # Omnibus tests for heterogeneity ####

    # Compare regions with ####
    # high and low estimated CATEs
    high_effect <- tau.hat[, 1] > median(tau.hat[, 1])
    ate.high <- average_treatment_effect(tau.forest, subset = high_effect)
    ate.low <- average_treatment_effect(tau.forest, subset = !high_effect)
    list_ate_difference_source_sharing_heterogeneity[[yz]] <- paste(
      round(ate.high[1] - ate.low[1], 3), "with a 95% CI interval of ", "+/-",
      round(qnorm(0.975) * sqrt(ate.high[2]^2 + ate.low[2]^2), 3)
    )

    # Best linear projection analysis ####
    test_calibration_source_belief <- test_calibration(tau.forest)
    list_test_calibration_source_sharing[[yz]] <- test_calibration_source_belief
    list_test_calibration_source_sharing_results[[yz]] <-
      paste("Best linear fit using forest predictions",
        " as well as the mean forest prediction as regressors, along",
        " with one-sided heteroskedasticity-robust (HC3) SEs resulted",
        " in a mean forest prediction estimate of ",
        round(test_calibration_source_belief[1], 2),
        " and a highly significant differential forest prediction estimate of ",
        round(test_calibration_source_belief[2], 2),
        " (P-value: ",
        format(round(test_calibration_source_belief[8], 3), nsmall = 3),
        "), meaning we can reject the null of no heterogeneity.",
        sep = ""
      )

    # Variable Importance Step 2 ####
    # Display the importance of different variables
    variable_importance <- tau.forest %>%
      variable_importance(max.depth = tree_depth) %>%
      as.data.frame() %>%
      mutate(variable = colnames(tau.forest$X.orig)) %>%
      arrange(desc(V1))


    # Choose variable to visualize (9 most important)
    variable_importance <- variable_importance %>% mutate(variable_label = variable)
    variable_importance$variable_label <- recode(variable_importance$variable_label,
      "trust_source_mainstream" = "Mainstream source trust",
      "age" = "Age",
      "know_politicians_total" = "Knowing politicians",
      "income_num" = "Income",
      "know_source_mainstream" = "Knowing sources",
      "overconfidence" = "Over confidence",
      "education_num" = "Education",
      "use_twitter_num" = "Twitter usage",
      "know_html_correct" = "Knowing html",
      "sex_num" = "Gender",
      "federal_state_west" = "West Germany",
      "use_fb_num" = "Facebook usage",
      "use_whatsapp_num" = "Whatsapp usage",
      "turnout_num" = "Vote participation",
      "(Intercept)" = "Intercept",
      "use_email_num" = "Email usage",
      "vote_choice_leftparty" = "Vote choice LeftParty",
      "vote_choice_greens" = "Vote choice Greens",
      "vote_choice_spd" = "Vote choice SPD",
      "vote_choice_cdu_csu" = "Vote choice CDU/CSU",
      "vote_choice_fdp" = "Vote choice FDP",
      "vote_choice_afd_num" = "Vote choice AfD",
      "vote_choice_dont_know" = "Vote choice Dont know",
    )

    variable_importance$outcome_label <- z
    variable_importance$outcome <- z
    # Add variable type
    variable_importance$type <- recode(variable_importance$variable,
      "trust_source_mainstream" = "continuous",
      "age" = "continuous",
      "know_politicians_total" = "categorical",
      "income_num" = "categorical",
      "know_source_mainstream" = "categorical",
      "overconfidence" = "categorical",
      "education_num" = "categorical",
      "use_twitter_num" = "categorical",
      "know_html_correct" = "categorical",
      "sex_num" = "categorical",
      "federal_state_west" = "categorical",
      "use_fb_num" = "categorical",
      "use_whatsapp_num" = "categorical",
      "turnout_num" = "categorical",
      "(Intercept)" = "categorical",
      "use_email_num" = "categorical",
      "vote_choice_leftparty" = "categorical",
      "vote_choice_greens" = "categorical",
      "vote_choice_spd" = "categorical",
      "vote_choice_cdu_csu" = "categorical",
      "vote_choice_fdp" = "categorical",
      "vote_choice_afd_num" = "categorical",
      "vote_choice_dont_know" = "categorical",
    )

    list_variable_importance[[yz]] <- variable_importance

    # TABLE: Variable importance (Appendix)
    list_variable_importance_table[[yz]] <- variable_importance %>%
      arrange(outcome_label, desc(V1)) %>%
      group_by(outcome) %>%
      # top_n(n = 6, V1) %>%
      ungroup() %>%
      rename(importance = V1) %>%
      select(outcome_label, variable_label, importance) %>%
      rename(Outcome = outcome_label, Covariate = variable_label, Importance = importance) %>%
      mutate(Importance = round(Importance, 2))



    # Best linear projection heterogeneity test (blpht) ####
    list_blpht_results[[yz]] <- best_linear_projection(tau.forest, X[, selected.idx])

    x <- tidy(blpht_source_belief) %>%
      filter(p.value < 0.1) %>%
      rename("variable" = "term") %>%
      left_join(variable_importance, by = "variable")

    list_blpht_results_vars[[yz]] <- paste(x$variable_label, collapse = ", ")

    list_blpht_results_text[[yz]] <- paste(paste(x$variable_label,
      " (Coef.:", round(x$estimate, 2),
      ", P-val.:", format(round(x$p.value, 2), nsmall = 2),
      ")",
      sep = ""
    ), collapse = ", ")


    # Data for plots ####
    list_data_heterogeneity[[yz]] <- bind_cols(list_data_heterogeneity[[yz]], tau.hat)
  }
}





# Merge list elements ####

# Variable importance ####
# Identify the 8 most important variables across all sharing outcomes
# Simple calculate the sum of variable importance
variable_importance_all <- bind_rows(list_variable_importance, .id = "model")
variable_importance_all <- variable_importance_all %>% mutate(model_label = outcome)
variable_importance_all$outcome_label <- recode(variable_importance_all$outcome_label,
  "share_report_1_email_num" = "Sharing Rep. 1 per email",
  "share_report_1_fb_num" = "Sharing Rep. 1 per Facebook",
  "share_report_1_twitter_num" = "Sharing Rep. 1 per Twitter",
  "share_report_1_whatsapp_num" = "Sharing Rep. 1 per Whatsapp",
  "share_report_5_email_num" = "Sharing Rep. 5 per email",
  "share_report_5_fb_num" = "Sharing Rep. 5 per Facebook",
  "share_report_5_twitter_num" = "Sharing Rep. 5 per Twitter",
  "share_report_5_whatsapp_num" = "Sharing Rep. 5 per Whatsapp"
)
variable_importance_all$treatment <- gsub(" .*$", "", variable_importance_all$model)




# TABLE: Variable importance (Appendix)
variable_importance_table_sharing <- variable_importance_all %>%
  arrange(model, desc(V1)) %>%
  group_by(model) %>%
  # top_n(n = 6, V1) %>%
  ungroup() %>%
  rename(importance = V1) %>%
  select(treatment, outcome_label, variable_label, importance) %>%
  rename(
    Outcome = outcome_label, Covariate = variable_label, Importance = importance,
    Treatment = treatment
  ) %>%
  mutate(Importance = round(Importance, 2)) %>%
  mutate(Treatment = recode(Treatment,
    "treatment_congruence50" = "Congruence treatment",
    "treatment_congruence50" = "Source treatment"
  ))


# Omnibus tests for heterogeneity ####
# Compare regions with ####
forest <- names(list_ate_difference_source_sharing_heterogeneity)
results <- as.character(list_ate_difference_source_sharing_heterogeneity)
table_compare_regions <- bind_cols(forest = forest, Results = results) %>%
  mutate(Treatment = ifelse(str_detect(forest, "treatment_source"), "Treatment source", "Treatment congruence")) %>%
  mutate(Outcome = sub(".* - ", "", forest)) %>%
  mutate(Outcome = recode(Outcome,
    "share_report_1_email_num" = "Sharing Rep. 1 per email",
    "share_report_1_fb_num" = "Sharing Rep. 1 per Facebook",
    "share_report_1_twitter_num" = "Sharing Rep. 1 per Twitter",
    "share_report_1_whatsapp_num" = "Sharing Rep. 1 per Whatsapp",
    "share_report_5_email_num" = "Sharing Rep. 5 per email",
    "share_report_5_fb_num" = "Sharing Rep. 5 per Facebook",
    "share_report_5_twitter_num" = "Sharing Rep. 5 per Twitter",
    "share_report_5_whatsapp_num" = "Sharing Rep. 5 per Whatsapp"
  )) %>%
  select(Treatment, Outcome, Results)
table_compare_regions$CI_95_low <- as.numeric(gsub(" .*$", "", table_compare_regions$Results)) - as.numeric(gsub("^.* ", "", table_compare_regions$Results))
table_compare_regions$CI_95_high <- as.numeric(gsub(" .*$", "", table_compare_regions$Results)) + as.numeric(gsub("^.* ", "", table_compare_regions$Results))
table_compare_regions$Results <- as.numeric(gsub(" .*$", "", table_compare_regions$Results))
table_compare_regions <- table_compare_regions %>%
  mutate(covers_0 = 0 >= CI_95_low & 0 <= CI_95_high)
table_compare_regions$Results[table_compare_regions$covers_0 == FALSE] <- paste(table_compare_regions$Results[table_compare_regions$covers_0 == FALSE], "*", sep = "")
# table_compare_regions$Results[1:3] <- paste(table_compare_regions$Results[1:3], "*", sep="")

table_compare_regions <- table_compare_regions %>%
  rename(Difference_estimate = Results) %>%
  select(-covers_0)
# names(table_compare_regions) <- Hmisc::capitalize(gsub("_", " ", names(table_compare_regions)))
table_compare_regions <- data.frame(table_compare_regions)


# Best linear projection analysis ####
# list_test_calibration_source_sharing_results

forest <- names(list_test_calibration_source_sharing_results)
results <- as.character(list_test_calibration_source_sharing_results)
table_blp_analysis <- bind_cols(forest = forest, Results = results) %>%
  mutate(Treatment = ifelse(str_detect(forest, "treatment_source"), "Treatment source", "Treatment congruence")) %>%
  mutate(Outcome = sub(".* - ", "", forest)) %>%
  mutate(Outcome = recode(Outcome,
    "share_report_1_email_num" = "Sharing Rep. 1 per email",
    "share_report_1_fb_num" = "Sharing Rep. 1 per Facebook",
    "share_report_1_twitter_num" = "Sharing Rep. 1 per Twitter",
    "share_report_1_whatsapp_num" = "Sharing Rep. 1 per Whatsapp",
    "share_report_5_email_num" = "Sharing Rep. 5 per email",
    "share_report_5_fb_num" = "Sharing Rep. 5 per Facebook",
    "share_report_5_twitter_num" = "Sharing Rep. 5 per Twitter",
    "share_report_5_whatsapp_num" = "Sharing Rep. 5 per Whatsapp"
  )) %>%
  select(Treatment, Outcome, Results)


# Best linear projection heterogeneity test (blpht) ####
names(list_blpht_results) <- gsub("treatment_source", "Treatment source", names(list_blpht_results))
names(list_blpht_results) <- gsub("treatment_congruence50", "Treatment congruence", names(list_blpht_results))
names(list_blpht_results) <- gsub("share_report_1_email_num", "Sharing Rep. 1 per email", names(list_blpht_results))
names(list_blpht_results) <- gsub("share_report_1_fb_num", "Sharing Rep. 1 per Facebook", names(list_blpht_results))
names(list_blpht_results) <- gsub("share_report_1_twitter_num", "Sharing Rep. 1 per Twitter", names(list_blpht_results))
names(list_blpht_results) <- gsub("share_report_1_whatsapp_num", "Sharing Rep. 1 per Whatsapp", names(list_blpht_results))
names(list_blpht_results) <- gsub("share_report_5_email_num", "Sharing Rep. 5 per email", names(list_blpht_results))
names(list_blpht_results) <- gsub("share_report_5_fb_num", "Sharing Rep. 5 per Facebook", names(list_blpht_results))
names(list_blpht_results) <- gsub("share_report_5_twitter_num", "Sharing Rep. 5 per Twitter", names(list_blpht_results))
names(list_blpht_results) <- gsub("share_report_5_whatsapp_num", "Sharing Rep. 5 per Whatsapp", names(list_blpht_results))
```

Below we discuss treatment heterogeneity regarding sharing intentions. As we described in Section \@ref(sec:results) we only found significant source and congruence effects on Facebook sharing intentions, but not for the other platforms. To explore heterogeneity on sharing outcomes, we pursue the same steps as for the belief outcome. Four each of the four sharing outcomes, and each of the two treatments, we grow a causal forest with those covariates only that had above average importance in a pilot forest. This results in eight forests. We start with two omnibus tests that generally gauge whether heterogeneity is present or not. Table \@ref(tab:tab-A28) depicts the results for the first omnibus test comparing regions of high and low treatment effect predictions. For each effect, the table displays the difference between the average predictions of the two regions, as well as a confidence interval for this difference. The test indicates that there is no treatment heterogeneity.

\setstretch{.8}

```{r tab-A28, echo=FALSE, message=FALSE, warning=FALSE, results="asis"}
x <- table_compare_regions
library("xtable")
knitr::kable(x,
  caption = "Comparing prediction regions (source and congruence effect on sharing)",
  format = "latex",
  booktabs = T,
  col.names = Hmisc::capitalize(gsub("_", " ", names(x))),
  linesep = ""
) %>%
  kable_styling(
    full_width = T,
    latex_options = c("striped", "scale_down", "HOLD_position"),
    font_size = 9
  ) %>%
  footnote(general = "* 95% confidence interval for estimate does not cover 0.") %>%
  column_spec(1, width = "3.5cm") %>%
  column_spec(2, width = "4.5cm")
```

\setstretch{1}

Table \@ref(tab:tab-A29) shows results for the second omnibus test, which uses the best linear predictor method (cf. Section \@ref(sec:thoverview)). A positive and statistically significant coefficient for the differential forest prediction could be indicative of relevant heterogeneity. Again this is not the case for any of the sharing outcomes. As a consequence, we do not pursue the exploration of treatment heterogeneity for our sharing outcomes any further.

```{r tab-A29, echo=FALSE, message=FALSE, warning=FALSE, results="asis"}
# table_blp_analysis = list_test_calibration_source_sharing

column_labels <- Hmisc::capitalize(gsub("share_report_1_|_num|share_report_5_", "", sub(".* - ", "", names(list_test_calibration_source_sharing))))

stargazer(list_test_calibration_source_sharing,
  type = "latex",
  omit.stat = c("LL", "ser", "f", "adj.rsq"),
  ci = FALSE, digits = 2,
  ci.level = 0.95,
  single.row = FALSE,
  label = "tab:tab-A29",
  table.placement = "!ht",
  column.sep.width = "1pt",
  title = "Best linear predictor tests (source and congruence effects on sharing)",
  align = TRUE,
  dep.var.caption = "Outcome: Sharing via",
  covariate.labels = c("Mean forest prediction", "Differential forest prediction"),
  column.labels = column_labels, # , "M3", "M4*"
  model.names = FALSE,
  model.numbers = FALSE,
  star.cutoffs = c(0.1, 0.05, 0.01), star.char = c("*", "**", "***"),
  notes.append = FALSE, notes = c("**p<0.05; **p<0.01; ***p<0.001."),
  header = FALSE,
  font.size = "scriptsize"
)
```






\newpage

## Saliency of topics {#sec:saliency}

To assess how salient the topics our news reports were before, during and after our data collection period we rely on Google Trends data. @Mellon2013-aj discusses the use of Google Trends data to measure issue saliency. While Google does not provide absolute numbers for searches, the respective API returns the relative prevalence of a search term as compared to others over time. Figure \@ref(fig-A19) and \@ref(fig-A20) provide different combinations of search terms and time periods. The y-axis always shows prevalence of searches relative to each other during the depicted time period and in the territory of Germany. The most searched term on day X is used as the maximum (= 100), i.e., to anchor the scale. The data has been aggregated up to week level. The x-axis displays the corresponding weeks.

Figure \@ref(fig-A19) displays Google trends for the search terms Einwanderung", "Flüchtlinge", "Asyl" and "Migration". Essentially, Figure \@ref(fig-A19) shows that the salience of the issue areas we investigate is relatively stable across the months that include our data collection.       Figure \@ref(fig-A20) extends the time span. First, we can see that the refugee issue was highly salient during 2015/2016 as compared to 2020. We added the events that are potentially responsible for the surge in searches namely the onset of the refugee crisis in Germany as well as the events around New Year in Cologne [@Hewitt2016-ni]. Second, we added another, completely unrelated salient but time-bound event namely the death of the famous German Designer "Lagerfeld" who died on February 19, 2019 which resulted in a massive spike of searches. The spike "Lagerfeld" generates as well as the higher saliency in 2015/2016 in Figure \@ref(fig-A20) further assures us that saliency regarding the issues contained in our news reports was relatively stable.

```{r fig-A19, echo=FALSE, fig.cap="Trends in google searches\\label{fig-A19}", fig.height=3.5, message=FALSE, warning=FALSE, out.extra=''}

# Words to search for
search.words <- c("Einwanderung", "Flüchtlinge", "Asyl", "Migration")

# Download google trends
data_google_trends <- gtrends(search.words,
  gprop = "web",
  time = "2018-12-31 2019-06-03",
  geo = "DE"
)[[1]]
write_csv(data_google_trends, "output/data_google_trends.csv")

data_google_trends <- data_google_trends %>%
  pivot_wider(names_from = c("keyword", "geo"), values_from = "hits") %>%
  dplyr::select(-time, -gprop, -category)

# Replace "<1" with 0
data_google_trends <- data_google_trends %>%
  mutate_all(funs(str_replace(., "<1", "0")))

# Convert date variable
data_google_trends$date <- as.Date(data_google_trends$date, "%Y-%m-%d")

# Mutate factor to numeric and reorder
data_google_trends <- data_google_trends %>%
  mutate_if(is.factor, as.character) %>%
  mutate_if(is.character, as.numeric)


names(data_google_trends) <- str_replace_all(names(data_google_trends), "ü", "ue")

# Aggegregate
data_google_trends <- data_google_trends %>%
  mutate(
    week = week(date),
    week_start = floor_date(date, "weeks", week_start = 1),
    week_end = ceiling_date(date, "weeks", week_start = 1)
  ) %>%
  group_by(week_start) %>%
  summarise(
    date = first(date),
    week_end = first(week_end),
    Einwanderung_DE = mean(Einwanderung_DE),
    Fluechtlinge_DE = mean(Fluechtlinge_DE),
    Asyl_DE = mean(Asyl_DE),
    Migration_DE = mean(Migration_DE)
  )

ggplot(
  data = data_google_trends,
  aes(x = week_start)
) +
  geom_rect(aes(fill = "fieldperiod"),
    xmin = as.Date("2019-03-14", "%Y-%m-%d"),
    xmax = as.Date("2019-03-29", "%Y-%m-%d"),
    ymin = 0, ymax = 100, alpha = 0.2
  ) +
  geom_line(aes(x = week_start, y = Einwanderung_DE, color = "Einwanderung")) +
  geom_line(aes(x = week_start, y = Fluechtlinge_DE, color = "Fluechtlinge")) +
  geom_line(aes(x = week_start, y = Asyl_DE, color = "Asyl")) +
  geom_line(aes(x = week_start, y = Migration_DE, color = "Migration")) +
  theme_light() +
  ylab("Searches (100 = max. interest\nin time period/territory)") +
  xlab("Weekly averages (2019)") +
  scale_colour_manual(name = "Search terms", values = c(
    Einwanderung = "darkgreen",
    Fluechtlinge = "black",
    Asyl = "red",
    Migration = "yellow"
  )) +
  scale_fill_manual(
    name = "Data collection",
    values = c(fieldperiod = "gray"),
    labels = c("Field period")
  ) +
  scale_x_date(
    date_breaks = "1 week",
    date_labels = "%Y-%m-%d" # ,
    # limits = c(as.Date("2018-12-31"), as.Date("2019-06-03"))
  ) +
  theme(
    legend.position = c(.95, .95),
    legend.justification = c("right", "top"),
    legend.box.just = "right",
    legend.margin = margin(6, 6, 6, 6),
    axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.background = element_rect(fill = adjustcolor("white", alpha.f = 0.7)),
    legend.key = element_rect(fill = adjustcolor("white", alpha.f = 0.7), color = NA)
  )
```

```{r fig-A20, echo=FALSE, fig.cap="Trends in google searches\\label{fig-A20}", fig.height=3.5, message=FALSE, warning=FALSE, out.extra=''}

# Words to search for
search.words <- c("Einwanderung", "Flüchtlinge", "Asyl", "Migration", "Lagerfeld")

# Download google trends
data_google_trends <- gtrends(search.words,
  gprop = "web",
  time = "2014-12-31 2019-06-03",
  geo = "DE"
)[[1]]
write_csv(data_google_trends, "output/data_google_trends2.csv")



data_google_trends <- data_google_trends %>%
  pivot_wider(names_from = c("keyword", "geo"), values_from = "hits") %>%
  dplyr::select(-time, -gprop, -category)

# Replace "<1" with 0
data_google_trends <- data_google_trends %>%
  mutate_all(funs(str_replace(., "<1", "0")))




# Convert date variable
data_google_trends$date <- as.Date(data_google_trends$date, "%Y-%m-%d")

# Mutate factor to numeric and reorder
data_google_trends <- data_google_trends %>%
  mutate_if(is.factor, as.character) %>%
  mutate_if(is.character, as.numeric)


names(data_google_trends) <- str_replace_all(names(data_google_trends), "ü", "ue")

# Aggegregate
data_google_trends <- data_google_trends %>%
  mutate(
    week = week(date),
    week_start = floor_date(date, "weeks", week_start = 1),
    week_end = ceiling_date(date, "weeks", week_start = 1)
  ) %>%
  group_by(week_start) %>%
  summarise(
    date = first(date),
    week_end = first(week_end),
    Einwanderung_DE = mean(Einwanderung_DE),
    Fluechtlinge_DE = mean(Fluechtlinge_DE),
    Asyl_DE = mean(Asyl_DE),
    Migration_DE = mean(Migration_DE),
    Lagerfeld_DE = mean(Lagerfeld_DE)
  )
write_csv(data_google_trends, "output/data_google_trends_saliency2.csv")



ggplot(
  data = data_google_trends,
  aes(x = week_start)
) +
  geom_rect(aes(fill = "fieldperiod"),
    xmin = as.Date("2019-03-14", "%Y-%m-%d"),
    xmax = as.Date("2019-03-29", "%Y-%m-%d"),
    ymin = 0, ymax = 100, alpha = 0.2
  ) +
  geom_line(aes(x = week_start, y = Einwanderung_DE, color = "Einwanderung")) +
  geom_line(aes(x = week_start, y = Fluechtlinge_DE, color = "Fluechtlinge")) +
  geom_line(aes(x = week_start, y = Asyl_DE, color = "Asyl")) +
  geom_line(aes(x = week_start, y = Migration_DE, color = "Migration")) +
  geom_line(aes(x = week_start, y = Lagerfeld_DE, color = "Lagerfeld")) +
  theme_light() +
  ylab("Searches (100 = max. interest\nin time period/territory)") +
  xlab("Weekly averages (2019)") +
  scale_colour_manual(name = "Search terms", values = c(
    Einwanderung = "darkgreen",
    Fluechtlinge = "black",
    Asyl = "red",
    Migration = "yellow",
    Lagerfeld = "orange"
  )) +
  scale_fill_manual(
    name = "Data collection",
    values = c(fieldperiod = "gray"),
    labels = c("Field period")
  ) +
  new_scale_color() +
  geom_vline(aes(
    xintercept = as.Date("2015-09-07"),
    linetype = "dashed"
  )) +
  geom_vline(aes(
    xintercept = as.Date("2015-12-31"),
    linetype = "dotted"
  )) +
  geom_vline(aes(
    xintercept = as.Date("2019-02-19"),
    linetype = "twodash"
  )) +
  scale_linetype_manual(
    name = "Events",
    values = c(
      "dashed",
      "dotted",
      "twodash"
    ),
    labels = c(
      "Refugee crisis\n(Summer 2015)",
      "New Year's Eve assaults\n(2020/02/19)",
      "Lagerfeld's death\n(2020/02/19)"
    )
  ) +
  scale_x_date(
    date_breaks = "8 weeks",
    date_labels = "%Y-%m-%d" # ,
    # limits = c(as.Date("2018-12-31"), as.Date("2019-06-03"))
  ) +
  theme(
    legend.position = c(.80, .99),
    legend.justification = c("right", "top"),
    legend.box.just = "right",
    legend.box = "horizontal",
    legend.direction = "vertical",
    legend.margin = margin(6, 6, 6, 6),
    axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.background = element_rect(fill = adjustcolor("white", alpha.f = 0.7)),
    legend.key = element_rect(fill = adjustcolor("white", alpha.f = 0.7), color = NA),
    legend.key.size = unit(0.6, "cm")
  )
```





## R session info 

\linespread{1}

```{r print-session, echo=FALSE}
print(sessionInfo(), local = FALSE)
```

\linespread{1.6}

\newpage


\newpage

## Pregistration {#sec:preregistration}
This study was preregistered before the start of data collection on the 12th of March, 2019 (see https://osf.io/q2ucj). Below we quickly discuss the main differences:

* The preregistration report contains a number of hypotheses that weren't the focus of the present paper. Accordingly, the hypotheses in the preregistration were renamed as follows for our study: H2a, H2b &rarr; H1a, H1b; H5a, H5b &rarr; H2a, H2b; H6a, H6b &rarr; H3a, H3b;
* There are also terminological differences. In the preregistration plan we generally used known/unkown, sometimes existing/non-existing to designate our two sources. In our study we now generally write of real/fake source (taking the researcher's perspective who is aware of their status).


## References

<div id="refs_app"></div>
